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PROCESSES AFFECTING 


SCORES ON 


“UNDERSTANDING OF 


OTHERS” AND “ASSUMED SIMILARITY” 


LEE J. CRONBACH 
College of Education, University of Illinois 


How one person judges another is a 
problem important for its theoretical 
implications and for its practical sig- 
nificance in group psychology, assess- 
ment, teaching, etc. Recent studies 
of “social perception,’’ as this area 
may be termed, have been chiefly con- 
cerned with differences among per- 
ceivers either in terms of their ac- 
curacy or in terms of their tendency 
to view others as similar to them- 
selves. 

These studies have usually been 
built around a particular operation 
in which a Judge (J) “predicts’’ 
how another person (QO) will respond. 
Often, for example, both persons 
describe themselves on a personality 
inventory, and J is then asked to fill 
out the inventory as he thinks O did. 
The extent to which the prediction 
agrees with O’s actual response is 
taken as a measure of J's accuracy 
of social perception (or ‘‘empathy,”’ 
‘social sensitivity,’’ ‘‘diagnostic com- 
petence,’’ etc.). Scores obtained in 
this manner are difficult to interpret, 
and several investigators have re- 
ported distressingly low consistency 


1 Appreciation is expressed to Mary E. 
Ehart, who assisted in all stages of this paper 
from initial conception to final interpretation, 
and to Urie Bronfenbrenner and associates for 
helpfully providing data and for their courtesy 
in exchanging ideas throughout our rather 
similar investigations. This study was con- 
ducted under ONR Contract N6ori-07135, of 
which Fred E. Fiedler is now principal investi- 
gator. 


for them (10, 15, 32). Although this 
paper discusses only the perception 
of Others, many of the findings have 
relevance also to studies of ‘‘insight”’ 
where a comparison is made between 
the subject's self-rating and the rating 
given him by Others. 

This paper seeks to disentangle 
some of the many effects which con- 
tribute to social perception scores, 
and to identify separately measur- 
able components. This analysis (a) 
shows that investigators run much 
risk of giving psychological interpre- 
tation to mathematical artifacts when 
they use measures which combine 
the components, (b) directs attention 
to some especially interesting aspects 
of social perception left untouched 
by the usual approach, and (c) sug- 
gests new ideas regarding the practi- 
cal use of judgments in clinic, school, 
and other places. 

Our analysis of social perception 
scores may be instructive regarding 
research strategy generally. This area 
of research has developed in an ultra- 
operationalist manner; of late, work- 
ers have seemed content to regard 
“empathy” as ‘‘what empathy tests 
measure."’ The principal research 
activity has been correlating ‘‘em- 
pathy,"’ so defined, with other vari- 
ables. We shall show, however, that 
the operation involves many unsus- 
pected sources of variation, so that 
scores are impure and results uninter- 
pretable (see also 16). Studies based 
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on myopic operationism are largely 
wasted effort when the operation 
does not correspond to potentially 
meaningful constructs. Defining a 
measure operationally is only a pre- 
liminary to analytic studies which 
can refine the measure and bring it 
closer to the intended construct. 

Our report on a specialized area 
of perceptual research shares much 
of the perspective of Postman’s 
important general review of percep- 
tion (26). His remarks are peculiarly 
pertinent to studies of social percep- 
tion, even though he was thinking 
more of the ‘New Look’’ studies 
of perception of words and objects: 

At this juncture of debate, we shall do well 
to pull up short a moment and reconsider the 
fundamental operations of our perceptual ex- 
periments, particularly as they bear on the 
validity of the theoretical constructs linking 
perception to motivation and personality. 
... Experiments have shared a common 
tendency which may be called the projective 
bias—a selective emphasis on central motiva- 
tional determinants at the expense of adequate 
attention to the verbal and motor response 
dispositions of the subject and the relation of 
these dispositions to the dimensions of the 
stimulus. . . . We must then reaffirm the criti- 
cal importance of a full and precise analysis 
of the responses as well as the stimuli which 
furnish the basic data of perceptual experi- 
meats (pp. 17-19). 


COMPONENTS OF THE ACCURACY 
SCORE 


In the typical experiment we have 
O's self-description x,; on a set of 
items, and a set of predictions y,,; 
by J. Error in prediction is repre- 
sented by the discrepancy between 
Xs and y..;. An over-all score repre- 
senting J's ability to perceive others 
is obtained by averaging his squared 
errors over all items and all Os. The 
precise mathematical formulation of 
this Accuracy score, and the assump- 
tions involved, are presented in the 
Appendix. It is important to renew 
here (see 8, p. 457) the warning that 
any index combining results from 
heterogeneous items presents serious 
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difficulties of interpretation. What- 
ever factors the items measure, a 
“‘global’’ measure combines with 
definite weights into a composite. 
Effects which operate differently on 
the several factors may be masked. 
An accuracy score based on hetero- 
geneous items is only an exploratory 
procedure; where possible it should 
be replaced or extended by separate 
analyses of J’s ability to predict dif- 
ferent qualities of O. 

The Appendix shows that the usual 
Accuracy score is the sum of four 
components we shall call Elevation 
(EZ), Differential Elevation (DE), 
Stereotype Accuracy (SA), and Dif- 
ferential Accuracy (DA). 

Elevation (E). The Elevation com- 
ponent has the form (9..;—#..)*. 9..; 
is the average of J’s predictions over 
all items and all Os; it reflects his 
way of using the response scale. The 
Elevation component is increased by 
any difference between J’s central 
tendency of responding and the cen- 
tral tendency of the self-descriptions, 
for all items and Os combined. 

Differential Elevation (DE). Dif- 
ferential Elevation reflects how close- 
ly J’s average prediction for O cor- 
responds to O’s central tendency of 
response, all items pooled and J’s 
central tendency of response held 
constant. That is, it reports J’s 
ability to judge deviations of the 
individuals’ elevation from the aver- 
age. We may write DE in this form: 


DE}? = 03, ;?+-02,.2 — 205, 0%,7..5..;- (1 ] 


The variance gy,,? expresses J's 
report of how much Os will differ in 


elevation. This assumed dispersion 
in elevation will later appear as a 
component of the Assumed Similarity 
score. gs," is the true dispersion in 
elevation. The correlation rz, 5, (to 
be symbolized DEr) represents J's 
ability to judge which Os rate highest 
on the elevation scale. 

In some tests, central tendency of 
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response (elevation) reflects insignifi- 
cant response sets. In other tests, 
elevation has an important psycho- 
logical meaning. Thus, if a high score 
on each item indicates morale, the 
correlation DEr shows how well J 
can judge which Os say they have the 
highest morale. 

Stereotype Accuracy (SA). Stereo- 
type Accuracy describes J’s ability 
to predict the norm for Os. It might 
be called ‘‘accuracy in predicting the 
generalized other’ (3). This score 
depends on J’s knowledge of the 
relative frequency or popularity of 
the possible responses. 

We may write: 


SA*= 094° +02,2— 209,402,794. [2] 
Here each variance is computed over 


items. The variance oz,’ is the scat- 
ter of the actual means. SA repre- 


sents ability to predict the profile of 
item means both as to shape and 
scatter. 1,2, (Stereotype Correla- 


tion, SAr) represents accuracy in 





1. Elevation component (£) 
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be regarded as combining variances 
with a correlation term. The correla- 
tion (DAr) measures ability to judge 
which Os have highest scores on the 
item, when the score is taken as a 
deviation from Os’ mean. There is 
one such correlation for each item. 

Removal of the EZ and SA com- 
ponents reduces all data to deviations 
from the group mean, or from the 
predicted mean. The DE component 
examines ability to recognize indi- 
vidual differences in the first centroid 
factor underlying the items. The 
DA component pools all remaining 
factors. There is nothing to prevent 
estimating further factors among the 
items by factor analysis, and deter- 
mining DA for each factor separate- 
ly. This type of cluster score on 
major factors appears preferable 
theoretically to the simpler index, 
>>; DA, and more reliable than the 
DA, taken separately (8). 

Implications. Seven aspects of 
J’s performance have been sepa- 
rated: 


2. Assumed dispersion in Elevation 


Differential Elevation (DE) 


3. Elevation correlation (DEr) J 


. Predicted variation in item means 


Stereotype Accuracy (SA) 


. Stereotype correlation (SAr) 


. Assumed dispersion on any item 


(Elevation held constant) 


Differential Accuracy (DA) 


. Differential correlation (DAr) 





judging mean profile shape, i.e., the 
order of item difficulties. 
Differential Accuracy (DA). Dif- 
ferential Accuracy reflects ability to 
predict differences between Os on 
any item. This component is aver- 
aged over items. As the Appendix 
indicates, this component too may 


The components are not necessarily 
uncorrelated. Change in any com- 
ponent alters the Accuracy score. 
Surely these aspects of social per- 
ception do not all reflect the same 
trait. A Judge who happens to use 
the same region of the response scale 
as other persons (Elevation is small) 
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need not have superior insight. 
Judging which items have the highest 
mean seems to require acquaintance 
with the norms of the group; but a 
person might possess such knowledge 
and yet be unable to differentiate 
accurately between individuals (16). 

At best, failure to separate these 
components makes interpretation 
ambiguous. Chowdhry and New- 
comb (5) requested group members 
to predict what percentage of their 
group would agree with each of many 
attitude statements. Ability to make 
this prediction was judged by a dif- 
ference score, and this score was 
found to correlate significantly with 
leadership status. This score, how- 
ever, combines Elevation and Stereo- 
type Accuracy; until the components 
are separately measured we cannot 
rule out the possibility that leaders 
simply used the correct range of the 
scale more often than nonleaders. 
This, in turn, might reflect willing- 
ness (or unwillingness) to use extreme 


percentages rather than any other 
subtle perceptiveness of specific atti- 


tudes. That such effects do occur is 
shown by Lorge and Diamond, who 
required judges to estimate what 
proportion of Os would pass ability 
test items. They found that poor 
judges were greatly helped simply by 
being told the difficulty of a few 
specimen items. ‘‘Apparently the 
difference between ‘poorest,’ ‘medi- 
ocre,’ and ‘best’ judges is that the 
‘best’ judges have some experiential 
referent for the per cent of the popu- 
lation that can pass an item. Giving 
such referents to the ‘poorest’ and 
‘mediocre’ judges . . . leads to a sig- 
nificant reorientation of such judg- 
ments” (19, p. 33). When judges 
responded only to the items, the best 
judges had a mean SAr of .73 and 
the poorest, .56. After information 
on averages for items was given, 
the same judges had mean correla- 
tions of .77 and .73. The difficulty 
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encountered in interpreting the 
Chowdhry-Newcomb study does not 
arise in Talland’s study of the same 
problem (33) where subjects were 
asked to predict what ranks will be 
assigned to certain stimuli. In rank- 
ing, elevation and dispersion are the 
same for everyone, and therefore the 
scores depend only on SAr. 

Failure to identify the compo- 
nents of the Accuracy score can lead 
to artifactual correlations. Only a 
few of the many examples in the 
literature need be cited. Norman 
and Ainsworth (24) report a large 
number of correlations between Ac- 
curacy (‘‘empathy’’) and Assumed 
Similarity (‘‘projection’’). Since the 
Accuracy score contains Assumed 
Similarity components, the two scores 
would necessarily overlap even if 
both sets of responses are determined 
strictly by chance. The correlations 
have no psychological meaning. Dy- 
mond (11) reported that persons with 
high Accuracy are also most easily 
judged. But a person who uses the 
scale in a typical manner will have a 
small Elevation component, hence 
better Accuracy; and other persons 
will have smaller Elevation errors in 
judging him, simply because of this 
typicality. This would happen even 
if the Judge predicted his responses 
without ever meeting him! Perhaps 
social psychologists should take what 
comfort they can from Bertrand 
Russell’s remark that physicists 
“have not yet reached the point 
where they can distinguish between 
facts about relativity and mathe- 
matical operations which may have 
nothing to do therewith.” 


COMPONENTS OF THE ASSUMED 
SIMILARITY SCORE 


The J is said to ‘“‘assume similar- 
ity’’ between himself and O if J's 
prediction for O differs little from 
J’s.self-description. One might study 
assumed similarity with respect to 
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each O separately, but we shall give 
attention to the tendency of J to 
assume similarity over Os in general. 
The formula for this AS score is 
given in the Appendix. AS is some- 
times interpreted as ‘“‘projection”’ 
or “identification” (29). As indi- 
cated in Equation 5a of the Ap- 
pendix, the AS score divides into 
the components Assumed Elevation 
(AE), Assumed __ Self-Typicality 
(AST), and two Assumed Disper- 
sions (ADE, ADI). 

Assumed Similarity in Elevation 
(AE). The first component, As- 
sumed Similarity in Elevation, takes 
the form (#..;—%.;)*. It measures J's 
tendency to assume that Os have the 
same average response as he does. 
This component would be important 
if items are polarized so that a high 
score on each represents good ad- 
justment or some other interpretable 
quality; the score then shows whether 
J regards the average O as similar to 
himself in this central dimension. 

Assumed Dispersions (ADE, ADI). 
A second component is gy, ,?, the As- 
sumed Dispersion in Elevation 
(ADE). Another is the Assumed 
Dispersion on specific items after 
differences in Elevation are removed 
(ADI). ADI closely resembles 
Gage’s concept of “‘rigidity’’ or ‘‘ad- 
herence to stereotype’”’ in prediction 
(14, p. 16; 15). These dispersions 
have already been encountered as 
components of ACC (see Equation 
1 above and Equation 4a in Ap- 
pendix). We shall refer to them as 
Assumed Dispersion in Elevation 
(ADE) and Assumed Dispersion 
Within Items (A DJ), respectively. 

Assumed Self-Typicality (AST). 
The remaining component measures 
the discrepancy between J's percep- 
tion of the average O and his self- 
description. This coniponent tells 
whether J regards his own profile 
as typical in shape. Or, we might 
say, this component shows the simi- 
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larity of J’s self-perception to his 
implicit stereotype of Os (Elevation 
held constant). We follow Gage 
(14, p. 17) in calling this Assumed 
Self-Typicality (AST). 

Of the four AS components, only 
AST divides into separate variance 
and correlational terms, as shown in 
Equation 6a of the Appendix. The 
correlation represents the similarity 
between J's self-description and the 
average profile, ignoring differences 
in elevation and scatter. We call it 
the Self-Typicality Correlation(S7r). 


IMPROVEMENT OF PREDICTIONS 


Insofar as our mathematical model 
is an acceptable approximation to 
real problems, we can reason mathe- 
matically to determine how judg- 
ments of Os may be improved. The 
conditions which make errors of 
prediction as small as possible are 
stated fully in the Appendix. The 
most significant principle takes this 
form: Accuracy is improved as oy 
approaches 7,,0,. That is to say, the 
variation in predictions should never 
exceed the variation in true responses, 
and should ordinarily be much 
smaller. 

This principle indicates that there 
is an optimal degree of differentiation 
in making judgments. if J can make 
accurate judgments as to the relative 
location of Os on a continuum, then 
he is wise to make g, as large as o,— 
never larger. But if he is forced to 
base his judgment on inadequate cues 
or if the available personality theory 
and situational knowledge do not 
permit trustworthy inference, then 
he should treat people as if they 
were very nearly alike. The person 
who attempts to differentiate indi- 
viduals on inadequate data intro- 
duces error even when the inferences 
have validity greater than chance. 

The variation of J’s predictions 
indicates how much he differentiates. 
For example, a teacher estimating 
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IQ’s in a class might spread them 
from 90 to 110 or from 70 to 130. 
We would expect the judge who per- 
ceives greater differences to apply 
more sharply differentiated treat- 
ments to the various persons. A 
person who knows that the expected 
o for IQ's is 16 might try to predict 
so that his estimates would have 
this o; but unless he is a perfect 
judge, this is unwise. He will have 
smaller errors if his predicted @ is 
less than 16—how much less depend- 
ing on the correlational accuracy of 
his predictions. 

If two diagnosticians can each 
judge some trait with correlational 
validity .40, the one who differenti- 
ates strongly (i.e., makes extreme 
statements) will make far more seri- 
ous absolute errors than the one who 
differentiates moderately. Indeed, 
the person who makes extreme dif- 
ferentiations based on a validity of 
40 may have larger errors than a 
judge who has zero correlational 
validity but gives the same estimate 
for everyone. This contradicts the 
view that judgment is always im- 
proved by taking into account addi- 
tional valid information. 

Implications regarding clinical 
judgment. Clinical judgments are 
frequently regarded as undepend- 
able, because of research tending to 
show that clinicians weight predictor 
variables inappropriately. Thus Sar- 
bin showed that counselors given a 
great amount of information pre- 
dicted grade averages no more ac- 
curately than did a regression for- 
mula, one reason being that the coun- 
selors gave the ACE test excessive 
weight (27). A similar problem of 
weighting is involved in attaining 
the ideal dispersion of estimates. The 
regression equation for two uncor- 
related predictors in standard form 
is: 


Hi. = Feat Pi3%5 + wer, [3] 
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The w, in the last term is a weight 
for error, and x, is an individual's 
estimated error score. We do not 
usually write this term because x, is 
zero and the term vanishes. Since 
O1.8=~V/1—w/=Ro;, the statistical 
prediction formula gives the optimum 
degree of differentiation, using the 
proper w,. But the clinician who 
combines variables 2 and 3 with the 
best relative weights may still obtain 
an inaccurate estimate if he employs 
the wrong w,, and so makes o;.3 too 
large. 

There is evidence, both from Sar- 
bin’s study and from similar recent 
work by R. S. Melton (20, 21) that 
counselors do overdifferentiate. In 
Sarbin’s study, o, for grade-point 
averages was .88 (for both sexes, 
computed by this writer from Sar- 
bin’s report). The statistical pre- 
dictions had a @ of .47; the clinical 
predictions, .57. In unpublished data 
supplied by Melton, ¢, was .59, but 
o, (for predictions) was .60. The 
optimum, r¢,, would presumably 
have been near .35. 

Sarbin was puzzled by the finding 
that o,<o,, and he discusses this as 
a “disadvantage” (28, p. 599). His 
remarks resemble those of others on 
the so-called ‘‘central tendency of 
judgment,”’ which has hitherto been 
regarded as a source of inaccuracy 
in social perception (1, p. 521). But 
estimates from a regression formula 
have a lower variance than observed 
results just because weight must be 
assigned to error. Sarbin’s clinicians 
made some, but insufficient, allow- 
ance for their error. Melton’s clini- 
cians made no allowance for error. 
Evidently the fault of the clinician is 
too little ‘central tendency of judg- 
ment.” 

Melton’s study combines all 
sources of error into a single measure 
of absolute error. He reports that 
counselors make greater error than a 
statistical predictor, and that this 
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remains true even when the clini- 
cians are given an actuarial table 
as a guide. Decomposing his ac- 
curacy score into components yields 
further knowledge about the predic- 
tive process. From the data he has 
supplied us, we learn that the group 
given the actuarial table makes no 
error in estimating mean GPA, while 
the control group overestimates by 
.65¢. Both groups differentiate far 
more than the optimum (and this 
effect appears to be intensified with 
the actuarial table). From the data 
at hand, we do not know whether 
correlation accuracy (DAr) is higher 
or lower for Melton’s counselors 
than for a statistical prediction, nor 
how the actuarial table altered this 
aspect of their judgment. 
Systematic errors such as over- 
optimism and overdifferentiation may 
be corrected fairly easily. It is im- 


portant for studies of clinical judg- 
ment to measure these errors as sepa- 


rate components, and for clinicians to 
train themselves to avoid these 
errors. 

Implications for teaching. Recog- 
nizing an optimum degree of differ- 
entiation makes it necessary to re- 
examine and qualify statements com- 
monly made in training teachers, to 
the effect that every pupil has his 
own pattern and the teacher must 
fit methods to that pattern, not treat 
the pupil in terms of the statistical 
average. The writer has himself ex- 
pressed such views, but it now ap- 
pears that the teacher who is poorly 
informed regarding the unique pat- 
terns of his pupils should probably 
treat them by a standard pattern of 
instruction, carefully fitted to the 
typical pupil. Modifying plans dras- 
tically on the basis of limited diagnos- 
tic information may do harm. Dif- 
ferentiation is harmful if the extent 
of adaptation or differentiation ex- 
ceeds the amount justified by the 
accuracy of social perception. 
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Teachers may properly modify 
treatments considerably when they 
are well able to judge individual 
differences. Differences in arithmetic 
achievement they might judge quite 
accurately; if so, they could profit- 
ably provide quite different assign- 
ments for different individuals. But 
if it is hard to judge creative potential 
in art, say, or the ultimate vocational 
goal of a ninth-grader, then it is a 
great mistake to differentiate treat- 
ment to fit perceived differences. 


ILLUSTRATIVE ANALYSIS OF 
CORNELL DATA 


To illustrate our system of analysis, 
we use data kindly provided by 
Bronfenbrenner and Dempsey. The 
data were gathered at Cornell Uni- 
versity primarily for pilot analyses 
such as ours. We shall deal with 
eight subjects and eight items. The 
eight subjects were candidates for 
employment as interviewers. Each 
person interviewed each of the seven 
others. In each interview, each man 
was to obtain information about his 
partner. Following the interview, 
each person stated his own reaction 
to 19 items (of which we use only 
eight) and predicted what his part- 
ner would say. One item is: To what 
extent did you feel at ease during 
the interview? a. very much 
____b. a good bit c. only slightly 

d. not at all. The respective 
responses are scored 1-2-3-4. 

Completion of the design provides 
seven self-descriptions and seven 
predictions by each man (also seven 
for each man). We have taken two 
simplifying steps which might be 
illegitimate for purposes other than 
demonstration. We use the average 
of O's responses over all seven inter- 
views as his true response, x,;, dis- 
carding information on O's variation 
from interview to interview. Second- 
ly, we treat J’s self-description as a 
“perfectly accurate prediction of 
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TABLE 1 
Accuracy Scores or E1cut Jupces Divipep into COMPONENTS 








Differ- 

Eleva- ential 

ACCf tion  Eleva- 
(EB) tion 


Stereo- 
type 
Com- 

ponent 
(SA) 





DE Contains SA Contains DA Contains 





"8J04 °94j; "84694 oy™ 





Q2nroueseane 
— eer euetunr 
~ KKK RReRe 


Mean 
Variance .83 


-13 





* Averaged over items. 


t The values of 7, .25 and .40, may be compared to these respective values of oz, the true variation: .22, .44. 


himself.”” By this device, we deal at 
all times with eight Js and eight 
Os, and the criterion is made the 
same for every person. 


Accuracy Scores for Eight Persons 


Table 1 presents the ACC score for 
each person, and his score on each 
component. These and subsequent 
results are illustrative, and nota 
proper basis for generalization. 

Relation of differentiation to ac- 
curacy. As expected, any component 
decreases as the predicted standard 
deviation (¢,) approaches the prod- 
uct of the related correlation with the 
actual standard deviation (¢,). Con- 
sider, for example, the results on SA 
and its constituents. Person 8 is an 
excellent Judge, according to his SAr 
of .92. But he expects too much 
variation in the item means (.76 
compared to an actual g of .44). As 
a consequence No. 8 has a poor SA 
score despite his excellent ability to 
discriminate between items. The 
best SA scores are earned by No. 1 
and No. 5, who have high correla- 
tions and who predicted variance 
close to the actual variance. Compare 
also DE and DEr of No. 3 and No. 8. 
These persons have the same DEr, 
but No. 3 underestimates the varia- 
tion in elevation, and No. 8 overesti- 


mates it. As expected, No. 3 earns 
the better DE score. 

The judges consistently overdiffer- 
entiate. The optimal a, would be 
about .025 in the DE column, and 
.30 in SA; but the actual mean 
values are .25 and .40. Since a,’? 
=.12, it is clear that g,’* is also 
generally larger than the optimum. 

Reliabilities and intercorrelations. 
Internal consistency was studied for 
the various components, but the 
results based on eight cases need not 
be reported. One finding, however, 
is notable. Differential Accuracy 
was strikingly consistent over items: 
a coefficient of .73 was obtained by 
analysis of variance. That is to say, 
some predictors were consistently 
good over all items, others consistent- 
ly poor. But when we examine the 
components of DA, we find that As- 
sumed Dispersion Within Items is 
consistent over items (.79), and DAr, 
the measure of accuracy in differenti- 
ating, is not (.18). In this sample, 
Differential Accuracy shows reli- 
ability only because some persons 
have consistent sets to differentiate. 
Stone and Leavitt (32) likewise find 
very low consistency (—.07 to .30) 
of accuracy scores in predicting differ- 
ent children, but a median consisten- 
cy of .63 between two predictions for 
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TABLE 2 


ASSUMED SmILaARity Scores DIvIDED INTO COMPONENTS 








Assumed 
Elevation 
(AE) 


AS? 


Assumed 
Dispersion 
in Elevation 


(ADE) 


Assumed Self- Assumed 
Self- Typicality Dispersion 
Typicality Correlation Within Items 
(AST)* (STr) (ADI) 





1.89 
78 
-63 
-45 
21 
54 
41 
-15 


.14 
35 


onan rt ONE 


Mean 3 
Variance 1 


01 
.14 


46 
28 
18 
.98 
59 
.26 
15 
66 


12 





34 
95 
.30 
30 
37 
.28 
.88 
16 


94 
60 
-90 
61 
.89 
94 
72 
.89 


97 
1.36 
3.09 
2.01 
1.25 
.87 
37 
-20 
55 .70 
16 


81 
-02 


64 
.49 





* A composite of STr and "9; of Table 1. See Equation 6a of Appendix. 


the same child. They trace the latter 
consistency to consistent favorable 
sets toward a given child, and to 
assumed similarity. All results to 
date lead us to doubt whether ac- 
curacy in differentiating personalities 
of others can be reliably measured. 
Where reliable variance is found, it 
seems to result from some constant 
mental set. 

In Table 1 we note that No. 1 is 
consistently superior on various com- 
ponents of Accuracy and No. 4 is 
consistently inferior. But No. 7, 
the best predictor as judged by DAr, 
is the poorest on DEr and next to 
poorest on SAr. With only eight 
cases, meaningful correlations can- 
not be obtained. 

Future studies of predictive ac- 
curacy should measure the compo- 
nents separately, preferably using 
two independent sets of items and 
Os. Such measurement will permit ac- 
curate determination of reliabilities 
of components, of the relation be- 
tween the components, and of their 
relation, if any, to external criteria. 
Ideally, items would be organized 
into clusters to permit study of 
predictions on separate traits. Only 
after such research can we decide 


how many components within the 
over-all Accuracy score presently 
used are important, and which un- 
wanted components must be sup- 
pressed by appropriate design of 
tests and scoring keys. 


Assumed Similarity Scores for Eight 
Persons 


In Table 2, the Assumed Similarity 
scores are divided into components. 
The relatively large variance of ADI 
indicates that it has great influence 
on individual differences in over-all 
AS. 

AE correlates .81 with ADE, and 
AST correlates .97 with ADI. In 
these data, the tendency to differen- 
tiate among Os is accompanied by a 
tendency to differentiate the average 
O from oneself. This result is partly 
an artifact, resulting from using each 
person's self-description as one of his 
“predictions.” Even allowing for 
this, our correlations suggest sepa- 
rating only two components of AE: 
Assumed Similarity in Elevation 
(AE+ADE) and Assumed Similarity 
in Pattern (AST+ADI). The cor- 
relation between these is only .21. 
Further evidence is required, how- 
ever, to establish definitely how to 
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divide Assumed Similarity. An earlier 
study (9) suggests strongly that As- 
sumed Similarity is a general mertal 
set, almost independent of the psy- 
chological content of the items. A 
global index may therefore be satis- 
factory for this score. 


THE JupGe’s “‘Impticit PER- 
SONALITY THEORY” 


We turn now to an aspect of social 
perception data which may prove to 
be particularly significant. When a 
Judge describes or makes predictions 
for a large number of Os, these pre- 
dictions define a distribution of points 
in the variate space. This distribu- 
tion may be regarded as a description 
of the generalized O, representing 
J’s view of both central tendency 
and individual differences. The J’s 
generalized perception may be an 
important indicator of his expecta- 
tions regarding Os. We shall discuss 
the general significance of this per- 
ceptual system before tracing its 
effect on social perception scores. 

The J's distribution is to be exam- 
ined in terms of the means, variances, 
and covariances of the predictions. 
The mean may be regarded as J's 
stereotype; if the mean O in his de- 
scriptions is ‘‘hostile,’’ for example, 
this may be highly significant. The 
variance or assumed dispersion on a 
variate indicates J's tendency to 
differentiate along that dimension. 
The covariance is interpreted as 
indicating the relation J expects to 
find among variates. A given J may 
customarily report the same persons 
as high on both ‘quietness’ and 
“shyness,"’ for instance; or on both 
“ambition” and “‘selfishness.”’ These 
aspects of the distribution reveal 
J’s view of Os and the connotation 
of personality traits for him. We sug- 
gest that these means, variances, and 
covariances describe J's implicit 
theory of personality. 

The expectations J has of Os con- 
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stitute his view of personality, and 
one may hypothesize that they direct 
his responses to Os. G. A. Kelly 
(18) argues that each person forms 
“personal constructs’’ by means of 
which he differentiates situations con- 
fronting him, including other persons. 
The “personal constructs” would be, 
in our model, the dimensions along 
which J differentiates strongly. As 
Steiner (31, p. 349) notes, it is par- 
ticularly important to investigate 
the covariation of his constructs. 
Osgood (25) suggests studying the 
semantic equivalence of stimuli by 
testing whether they are used simi- 
larly. Our method is quite like his, 
determining as it does what traits 
J uses to describe the same persons. 
While Steiner asks a person directly 
what traits he expects to be associ- 
ated, we suggest looking at the co- 
variation found among the predic- 
tions y.; Such implicit relations are 
not subject to deliberate distortion 
and can reveal associations and norms 
of which J himself is unaware. 

An illustrative case. This concept 
can be illustrated by using a small 
portion of the Bronfenbrenner- 
Dempsey data. The J predicted re- 
sponses of eight persons (including 
himself) on these questions: 


1. In general, how openly did you express 
your feelings and emotions during the inter- 
view? 

2. How much interest did you feel in the 
other man as a person? 

3. How much were you aware of how he was 
feeling? 

4. How much opportunity did you give him 
to interview you? 

5. How much important information were 
you able to get about him? 

6. To what extent did you feel at ease dur- 
ing the interview? 

7. To what extent did you succeed in estab- 
lishing a good interviewing relationship? 

8. To what extent did you feel like the per- 
son being interviewed rather than the person 
doing the interviewing? 


The matrix of covariances for 
Judge 3, a poor predictor, was fac- 
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TABLE 3 


SUBSTANTIAL Factors iN JUDGE 3's COVARIANCE Matrix* 








Factor II 

“Exchange 
of Infor- 

mation” 


Factor I 


Item 
“Pressure” 


Factor III 
“Rapport” 


Mean 
Prediction 
Daa 


Variance 
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55 
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.76 
46 
68 
92 
.88 
50 


.68 
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* 
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variance 
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.63 
40 
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-63 
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.93 
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10 


88 





* Largest loadings in each column in italics. 


tored by a pivotal method intended 
to yield interpretable factors. Table 
3 shows the loadings on three factors, 
and also item means and variances. 
The means for Judge 3 show no strik- 
ing features, especially when con- 
sidered in relation to the true means 
presented in Table 4. The variances 
indicate that No. 3 regards others as 
fairly uniform in their awareness of 
him (item 3), and as varying espe- 
cially in openness, ease, and feeling 
of dominance (items 1, 6, 8). No 
confidence can be placed in factors 
based on eight measures, but we 
would otherwise interpret Factor | 
as representing a feeling of being 
under pressure. It is notable that No. 
3 regards those persons who are most 
open (item 1) as being least at ease 
(item 6). Factor II shows a link 
between items 4 and 5, getting and 
giving information. Factor III is 
indistinct. It is notable that items 
6 and 7 are correlated; a ‘“‘good 
interviewing relation’ is perceived 
by No. 3 as one where the inter- 
viewer is at ease! Such a finding 
regarding No. 3’s perception, if 
better substantiated, might have 


much diagnostic importance. 
Relevant prior studies. The litera- 
ture contains many studies of correla- 
tion between ratings which bear on 
the perceiver’s frame of reference. 
Reports of halo effect suggest the 
existence of a strong general good-bad 
factor. These studies have not exam- 
ined raters separately; Newcomb 
(23) showed that there were sub- 
stantial correlations among ratings 
(? =.49) even where direct behavioral 
observations on the same qualities 
showed a mean correlation of only 
14. ‘The close relation... ,’’ says 
Newcomb, “may be presumed to 
spring from logical presuppositions 
in the minds of the raters’’ (p. 288). 
Steiner (31) found evidence that 
ethnocentric individuals see others in 
black and white terms, the ‘‘good,”’ 
“strong”’ traits going together. In 
our language, their covariance matrix 
is loaded with one factor, while non- 
authoritarians use many factors and 
do not emphasize the general evalua- 
tive dimension. Soskin (30) showed 
that halo effect and the stereotype 
profile varied as a function of the 
data given the assessor and that 
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TABLE 4 
FACTORS IN THE CRITERION COVARIANCE Matrix DETERMINED BY PIVOTAL METHOD* 








Factor II 
“‘Receptive- 
ness”’ 


Factor I 


Item “Openness” 


Factor III 
“Passivity” 
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ratings of peers are distributed dif- 
ferently from ratings by professional 
assessors. 

A striking recent study by Jones 
(17) compares the ratings given by 
authoritarian (A) and nonauthori- 
tarian (NA) groups to Others re- 
garding whom carefully controlled 
information had been given. He 
finds many types of differences in- 
cluding a tendency of the NA’s to 
respond to personal qualities of the 
Other, whereas A’s seem to differ- 
entiate less among leaders. The 
data are analyzed to show, in effect, 
which traits perceived in the Other 
vary with the Other’s democracy. 
The groups agree in associating de- 
mocracy with sensitive to Other, gener- 
ous, adaptable, warm. The A's associ- 
ate democrat also with unambitious, 
poor officer, undependable, hard to 
figure out, acts without thinking, 
rebellious; while the NA’s associate 
these qualities with autocrat. An at- 
tempt was made also to find correlates 
of forceful. The two groups showed 
little difference in covariances, as- 
sociating forceful with natural leader, 
ambitious, uses his head, etc. At least 
two other studies show differences in 


the perceptual reference frames of 
groups. Wickman’s well-known study 
(34) showed that teachers expected 
different traits to correlate with men- 
tal health than did mental hygien- 
ists. Moore (22) performed a factor 
analysis of ratings given noncom- 
missioned officers by their subordi- 
nates, and also of ratings given by 
their superiors. The factor patterns 
differed. For instance, superiors 
coupled leadership with eagerness and 
responsibility, but the subordinates 
linked leadership with intelligence and 
skill. 

None of these studies of groups 
examines the perceptual space by 
which an individual describes per- 
sonality, but the evidence supports 
the belief that important individual 
differences exist. Our proposed analy- 
sis of the covariance matrix can be 
applied to the matrix based on mean 
ratings given by a group of raters. 
This should give more complete in- 
formation regarding group differences 
in implicit meanings than the meth- 
ods used in the studies cited. The 
emphasis in recent studies has been 
to consider correlations between items 
as a meaningful phenomenon, re- 
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lated to psychologically interesting 
qualities of the rater. This may be 
contrasted to the view in the earliest 
studies such as Newcomb’s, where 
such correspondences were regarded 
solely as an annoying interference or 
so-called ‘‘logical error’ in rating. It 
also contrasts with many recent stud- 
ies which concentrate on the inter- 
action between Perceiver and Other, 
failing to inquire about elements 
associated with the Perceiver alone. 

The analysis of the implicit mean- 
ings of various dimensions for the 
Perceiver may be used in several 
ways. If, in teacher training, one 
aim is to modify the way in which 
teachers interpret behavior, these 
changes in viewpoint should be re- 
flected in changes of the perceptual 
distribution. For example, if teach- 
ers naively regard quietness as associ- 
ated with adjustment, yet experts 
regard quietness as unrelated or even 
negatively related to adjustment in 


children, then the aim of training 
is to reduce or reverse the correlation 
found in the teachers’ responses. Be- 
cause our technique examines im- 
plicit interpretations, it should be 


especially useful for evaluation. An- 
other application of the method is 
in industrial rating. The rater’s 
distribution, when rating many appli- 
cants on many traits, indicates what 
differences he pays attention to and 
how he interprets the traits he is 
supposed to rate. In analyzing one 
rater in this way, we find that he 
regards creative and imquiring as 
nearly independent of intelligent, and 
their independent contribution is 
negatively (!) correlated with his 
final recommendation as to hiring. 
So far as we can determine, previous 
studies have pooled all raters before 
studying trait intercorrelations;? the 


* Especially clear evidence on this point is 
provided in a recent dissertation by Walker. 
(Walker, W. B. An investigation of the ef- 
fectiveness of communication between psy- 
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study of idiosyncratic rating pat- 
terns should lead to important sug- 
gestions for training raters or im- 
proving scales. Finally, in view of 
our interpretation of the perceptual 
distribution as an implicit personality 
theory, special interest would at- 
tach to studies of ratings given by 
clinical psychologists or psychiatrists 
of different schools, who might be 
presumed to hold different theories. 

Effect on accuracy scores. The J’s 
distribution of Os has been inter- 
preted here as a standing system of 
meanings which delimits the space 
within which he lecates Os. It is 
obvious that any such delimitation 
would affect social perception scores. 
Discrepancies between mean and 
actual mean lower Stereotype Ac- 
curacy, and Accuracy declines if per- 
ceived variance (ADE, ADI) de- 
parts from an optimal value. The cor- 
relational effects are a bit less easy 
to perceive. 

Correlations describe the shape of 
the distribution of Os. If traits 1 
and 2 are uncorrelated, then x1, Xe 
will have a roughly circular joint 
distribution. If a J regards 1 and 2 
as correlated, his perceived distribu- 
tion of Yor, Yoo will be elliptical. Per- 
ceived variance along the dimension 
1+2 will become greater than in the 
true responses, and Accuracy will 
suffer. We can view the example in 
another way. Suppose the Judge 
predicts variate 1 perfectly but be- 
lieves that variates 1 and 2 correlate 
1.00—then he must have substantial 
error in predicting variate 2, He 
can predict 2 accurately only if he 
perceives the covariance of 1 with 2 
accurately. 

Data reported by Crow (10, p. 86) 
show this phenomenon clearly. He 
asked Js to predict what would be 
the first word missed by a patient on 





chologists and sales executives through per- 
sonnel audit reports. Unpublished doctor’s 
dissertation, Western Reserve Univer., 1955.) 
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a vocabulary test and what would 
be the highest level attained (called 
tasks Di and D2). The correlation 
of Js’ Accuracy on D1 with Accuracy 
on D2 was positive and significant 
for five of ten patients, but negative 
and significant on two patients. 
Judges tended to expect a correlation 
between D1 and D2, and they were 
accurate on those patients where the 
two scores were similar. Where the 
scores were dissimilar, Js could not 
be accurate on both predictions. 
There was a rank correlation of .97 
(over Os) between Accuracy, and con- 
sistency of O's performance. 

We can provide further illustration 
from the Cornell data. The covari- 
ance between items in self-descrip- 
tions was factored, with the results 
shown in Table 4. This pattern is 
different from that of No. 3 (Table 3) 
in several respects. Notably, No. 3 
overdifferentiates on all items. The 
first factor for No. 3 lumps openness 
and lack of receptiveness; these vari- 
ables are divided among two factors 
in the criterion. In the criterion, 
being at ease (item 6) is positively 
related to openness. It is especially 
interesting that ‘‘feeling like the 
person being interviewed”’ is, for the 
group as a whole, positively correl- 
ated with being at ease; but for No. 3 
these items are negatively corre- 
lated. When his expectancy is so 
discrepant from the facts, it is not 
surprising that No. 3 has poor ac- 
curacy. 

RECOMMENDATIONS 

Studies of perception may be con- 
cerned either with constant processes 
or with variable processes. When 
social perception is regarded (as in 
1, pp. 499-548) as a process of inter- 
preting the expressive cues O pre- 
sents, or of empathizing with him, 
the search is clearly for a variable 
process. The concept of an “‘intui- 
tive” perception of Os which under- 
lies much of the relevant research 


LEE J. CRONBACH 


implies that J is reacting to the 
particular O as a stimulus, and ig- 
nores the fact that the perceptual 
response also depends on stereotypes 
in J’s mind (cf. Cattell, 5). We have 
seen that the measures currently used 
are affected by both constant and 
reactive processes, and therefore can- 
not serve well to investigate either. 
As Crow states: 


The difficulty stems from failure to recog- 
nize that two meanings of predictive accuracy 
are involved. The use of the correlation scor- 
ing method [either Teosveii OT Te’ gv’; ) defines 
predictive accuracy as the ability to vary one’s 
predictions as the actual situation varies. The 
difference score method defines predictive ac- 
curacy as the ability to approximate the actual 
situation. By the difference score method a 
subject is penalized for a systematic error in 
estimation of the magnitude of the actual 
situation. By the correlation method the sub- 
ject is not so penalized. Conversely, a subject 
is penalized by the correlation method if, al- 
though he has approximated the actual situa- 
tion, his predictions do not vary concomi- 
tantly with the actual scores. Each of these 
scoring methods has its advantages and dis- 
advantages. The choice of which technique to 
use will depend on the purpose for whicha 
study is conducted, although a second basis for 
choice depends upon the empirical relation- 
ship between two procedures (10, p. 57). 


An argument can be presented for 
concentrating attention on constant 
processes, taking up interactions be- 
tween J and O only after the constant 
processes characteristic of J are de- 
pendably measured. Constant proc- 
esses in the perceiver have potentially 
great importance because they affect 
all his acts of perception. Individual 
differences in constant processes need 
to be measured dependably so that 
their influence can be discounted in 
studies of variable processes. More- 
over, identifying constant errors should 
permit training to eliminate such 
biases ; this may be the most effective 
way to improve the social perception 
of leaders, teachers, and diagnosti- 
cians. 

Not all constant processes are of 
theoretical importance. We ven- 
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ture to suggest which components of 
social perception measures deserve 
attention, recognizing that the ulti- 
mate importance of the components 
depends on whether they relate to 
important criteria. 

1. To some extent, the Elevation 
components (EZ, DE, DEr) reflect 
whether J interprets the words de- 
fining the scale in the same manner 
as Os do. It appears relatively un- 
fruitful, therefore, as a source of in- 
formation on his perception of Os 
(7; 8, p. 463). It should be separately 
measured where it is believed to 
have psychological significance, and 
otherwise eliminated from considera- 
tion. This is consistent with Post- 
man’s view: 

In experiments concerned with the determi- 
nants of perceptual selectivity, the contribu- 
tion of verbal and motor response habits must 
be specifically evaluated and wherever possible 
held constant. The effects of the independent 
variables can then be evaluated against an 


empirical baseline defined by the response 
habits of the subjects (26, p. 26). 


2. The Assumed Similarity meas- 
ures reflect a general orientation 
toward Others. Perhaps the tendency 
to differentiate which these indices 
measure is a reaction shown only in 
the testing situation. But the fact 
that significant behavioral correlates 
have been found for Assumed Simi- 
larity (2, 4, 12, 13, 29) suggests that 
this is a generalized mental set influ- 
encing both test and nontest be- 
havior. Investigators would do well, 
however, to consider Postman’s con- 
clusion that response dispositions can 
be established unambiguously only 
if they are measured by more than 
one type of response (26, p. 27). 

Components of Assumed Similarity 
include Assumed Dispersion in Eleva- 
tion, Assumed Dispersion over Items, 
Assumed Similarity in Elevation, and 
Assumed Self-Typicality. Further 
research is required to determine 
whether these should be measured 
separately or combined. 
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3. Stereotype Accuracy expresses 
how closely J’s implicit picture of 
the generalized Other agrees with 
reality. Differences of this sort are 
probably important. Attention 
should be given to the nature of J's 
errors, as well as to the over-all 
magnitude of the component. 

4. The J's perceptual space, stud- 
ied as a whole, includes not only in- 
formation on his stereotype and his 
assumed dispersion, but also on the 
way in which he organizes the field 
of personality. This type of constant 
cognitive process appears to be a 
most important area for research. 

5. The Differential Elevation Cor- 
relation and the Differential Accuracy 
Correlation are measures of J's sensi- 
tivity to individual differences. These 
measures reflect his ability to inter- 
pret expressive behavior, or his abil- 
ity in differential diagnosis. These 
are the only processes included in 
present measures of social perception 


which depend on J’s sensitivity to 


the particular O. The reliability of 
measures of this variable process 
has not been encouraging. But those 
who wish to study “empathy” or 
“social sensitivity’ as it has usually 
been conceptualized should extract 
these correlational components from 
their measures. 

Social perception research has been 
dominated by simple, operationally 
defined measures. Our analysis has 
shown that any such measure may 
combine and thereby conceal im- 
portant variables, or may depend 
heavily on unwanted components. 
Only by careful subdivision of global 
measures can an investigator hope 
to know what he is dealing with. 
Our analysis makes especially clear 
that the investigator of social per- 
ception must develop more explicit 
theory regarding the constructs he 
intends to study, so that he can re- 
duce his measures to the genuinely 
relevant components. 
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APPENDIX 


To simplify this paper for the reader, we 
have placed our detailed mathematical argu- 
ment here. In our notation, x. is the self- 
description of Other o on item i(i=a, }, 
¢,+ ++, k). You is Judge j’s description of o on 
#. We employ the customary notation for 
means: #,, indicates the average of x,« over all 
items, £4 the average over Others, and 2.,, the 
grand mean over Others and items. Averages 
on y are defined similarly. x04’ = %o4 —4%..— 2.4 
+-4,,; that is, the score x, is transformed as a 
deviation from both item mean and Other 
mean. Yes’ is defined similarly. 

Error in prediction may be measured by 

Yosj —%es|. We shall, however, use the squared 
difference. This formula is easier to treat 
mathematically than the absolute difference, 
and will ordinarily give similar results. When 
all items are in a Yes-No form, so that the 
error on any prediction is 1 or 0, the two meth- 
ods give identical results. Our measure has 
the important property of being invariant 
under orthogonal rotation of axes (8). 

The Accuracy with which J perceives all 
Others is defined by 


bape 
ACCf=—— ( lot —Xoi)*. ta 
sD [ta] 


The following identity may be written: 


Yoij—~ Zoi (9.4—2. ) 
+1(94—9..4) — o.—2..)] 
+[(9.4—9..i)—(2-2..)] 
+ (y0ij’ — %oi’). 
When we square and sum, cross-products drop 


out and we have the resolution of ACC; into 
components: 
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1 a 

+ > > (Yoig’ — X0s")*. (3a) 
In order, these components are called E, DE, 
SA, and DA (squares being ignored except in 
equations). Each of the three latter terms may 
be rewritten as the variance of a difference. 
Equations 1 and 2 in the text indicate this 
form for DE and SA. We also have 


[4a] 


DA,j}, averaged over items yields DA;*. Each 
variance in the formula is taken over Others. 

Some investigators have preferred to com- 
pute DA,;* aidad —yos')*. Summed over 
Others, this also yields DA;*. Subdivided, 
DA,;* would yield a variance over items for 
the Other, and a “Q correlation” over items 
comparing predicted deviations for the Other 
with the actual deviations. This method of 
organizing the data is not recommended, be- 
cause the correlations are critically dependent 
on the factorial content of the items employed 
and on the direction chosen as representing a 
high score on the item. In personality data, 
this direction is frequently arbitrary. 

Assumed similarity is also defined in terms 
of a sum of squared differences: 


1 
AS Pm 20D (ois 2s) *= (9.52.4) 
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+7 ds Oy gi [Sa] 
The components, in order, are AE, ADE, 
AST, and ADI. Only AST can be rewritten as 
a variance of differences: 


[6a 


This correlation is referred to as the Self- 
Typicality correlation (ST7r). 

We assume that the goodness of predictions 
can be evaluated by the mean square error. 
Taking the derivative of each component of 
ACC}, and setting that derivative equal to 
zero, we find that ACC becomes smaller and 
therefore prediction improves, when: 

a. J has a typical response set (E ap- 
proaches zero). 

b. o% ,, approaches 75 45 «9,4. Here the vari- 
ance is over items. This means that o; ,, 
should not exceed oj ,, and should be near zero 
if the Stereotype correlation is low. If this cor- 
relation is low, the more J differentiates among 
items, the poorer is his Accuracy. 

C. Gy approaches fg4ry,F2q, the variance be- 
ing over Others. This means that oy should 
not exceed o,’, and should be near zero if the 
Differential correlation is low. This principle 
holds for accuracy of prediction on any single 
item, and for predicting elevation. 
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In clinical practice, psychologists 
frequently participate in the making 
of vital decisions concerning the 
classification, treatment, prognosis, 
and disposition of individuals. In 
their attempts to increase the num- 
ber of correct classifications and pre- 
dictions, psychologists have de- 
veloped and applied many psycho- 
metric devices, such as patterns of 
test responses as well as cutting 
scores for scales, indices, and sign 
lists. Since diagnostic and prognostic 
statements can often be made with a 
high degree of accuracy purely on the 
basis of actuarial or experience tables 
(referred to hereinafter as base rates), 
a psychometric device, to be efficient, 
must make possible a greater number 
of correct decisions than could be 
made in terms of the base rates alone. 

The efficiency of the great majority 
of psychometric devices reported in 
the clinical psychology literature is 
difficult or impossible to evaluate for 
the following reasons: 

a. Base rates are virtually never 
reported. It is, therefore, difficult 
to determine whether or not a given 
device results in a greater number of 
correct decisions than would be possi- 
ble solely on the basis of the rates 
from previous experience. When, 


1 From the Neuropsychiatric Service, VA 
Hospital, Minneapolis, Minnesota, and the 
Divisions of Psychiatry and Clinical Psy- 
chology of the University of Minnesota 
Medical School. ‘The senior author carried 
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appointment to the Minnesota Center for the 
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however, the base rates can be esti- 
mated, the reported claims of efficien- 
cy of psychometric instruments are 
often seen to be without foundation. 

b. In most reports, the distribution 
data provided are insufficient for 
the evaluation of the probable effi- 
ciency of the device in other settings 
where the base rates are markedly 
different. Moreover, the samples 
are almost always too small for the 
determination of optimal cutting 
lines for various decisions. 

c. Most psychometric devices are 
reported without cross-validation 
data. If a psychometric instrument is 
applied solely to the criterion groups 
from which it was developed, its 
reported validity and efficiency are 
likely to be spuriously high, especial- 
ly if the criterion groups are small. 

d. There is often a lack of clarity 
concerning the type of population in 
which a psychometric device can be 
effectively applied. 

e. Results are frequently reported 
only in terms of significance tests 
for differences between groups rather 
than in terms of the number of cor- 
rect decisions for individuals within 
the groups. 

The purposes of this paper are to 
examine current methodology in stud- 
ies of predictive and concurrent 
validity (1), and to present some 
methods for the evaluation of the 
efficiency of psychometric devices as 
well as for the improvement in the 
interpretations made from such de- 
vices. Actual studies reported in the 
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literature will be used for illustration 
wherever possible. It should be 


emphasized that these particular il- 
lustrative studies of common prac- 
tices were chosen simply because they 
contained more complete data than 
are commonly reported, and were 
available in fairly recent publications. 


IMPORTANCE OF BASE RATES 


Danielson and Clark (4) have re- 
ported on the construction and appli- 
cation of a personality inventory 
which was devised for use in military 
induction stations as an aid in detect- 
ing those men who would not com- 
plete basic training because of psy- 
chiatric disability or AWOL recidi- 
vism. One serious defect in their 
article is that it reports cutting lines 
which have not been cross validated. 
Danielson and Clark state that in- 
ductees were administered the Fort 
Ord Inventory within two days after 
induction into the Army, and that 
all of these men were allowed to un- 
dergo basic training regardless of their 
test scores. 

Two samples (among others) of 
these inductees were selected for the 
study of predictive validity: (a) A 
group of 415 men who had made a 
good adjustment (Good Adjustment 
Group), and (b) a group of 89 men 
who were unable to complete basic 
training and who were sufficiently 
disturbed to warrant a recommenda- 
tion for discharge by a psychiatrist 
(Poor Adjustment Group). The au- 
thors state that ‘‘the most important 
task of a test designed to screen out 
misfits is the detection of the (latter) 
group” (4, p. 139). The authors 
found that their most effective scale 
for this differentiation picked up, at 
a given cutting point, 55% of the 
Poor Adjustment Group (valid posi- 
tives) and 19% of the Good Adjust- 
ment Group (false positives). The 
overlap between these two groups 
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would undoubtedly have been greater 
if the cutting line had been cross 
validated on a random sample from 
the entire population of inductees, but 
for the purposes of the present dis- 
cussion, let us assume that the re- 
sults were obtained from cross-vali- 
dation groups. There is no mention 
of the percentage of all inductees who 
fall into the Poor Adjustment Group, 
but a rough estimate will be adequate 
for the present discussion. Suppose 
that in their population of soldiers, 
as many as 5% make a poor adjust- 
ment and 95% make a good adjust- 
ment. The results for 10,000 cases 
would be as depicted in Table 1. 


TABLE 1 


NUMBER OF INDUCTEES IN THE Poor Apjust- 
MENT AND Goop ADJUSTMENT GROUPS 
DETECTED BY A SCREENING 
INVENTORY 


(55% valid positives; 19% false positives) 


Actual Adjustment 








Total 
Pre- 
dicted 





Predicted 
Adjustment 


Poor Good 





No. % No %&% 





Poor 275 +~=55 
Good 225 45 
Totalactual 500 100 


1,805 19 
7,695 81 
9,500 100 


2,080 
7,920 
10,000 





Efficiency in detecting poor adjust- 
ment cases. The efficiency of the 
scale can be evaluated in several 
ways. From the data in Table 1 
it can be seen that if the cutting line 
given by the authors were used at 
Fort Ord, the scale could not be used 
directly to “screen out misfits.” If 
all those predicted by the scale to 
make a poor adjustment were screened 
out, the number of false positives 
would be extremely high. Among the 
10,000 potential inductees, 2080 
would be predicted to make a poor 
adjustment. Of these 2080, only 275, 
or 13%, would actually make a poor 
adjustment, whereas the decisions 
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for 1805 men, or 87% of those 
screened out, would be incorrect. 
Efficiency in prediction for all cases. 
If a prediction were made for every 
man on the basis of the cutting line 
given for the test, 275+7695, or 
7970, out of 10,000 decisions would be 
correct. Without the test, however, 
every man would be predicted to 
make a good adjustment, and 9500 
of the predictions would be correct. 
Thus, use of the test has yielded a 
drop from 95% to 79.7% in the total 
number of correct decisions. 
Efficiency in detecting good adjust- 
ment cases. There is one kind of de- 
cision in which the Inventory can 
improve on the base rates, however. 
If only those men are accepted who 
are predicted by the Inventory to 
make a good adjustment, 7920 will be 
selected, and the outcome of 7695 
of the 7920, or 97%, will be pre- 
dicted correctly. This is a 2% in- 
crease in hits among predictions of 
““success.”’ ‘The decision as to whether 
or not the scale improves on the base 
rates sufficiently to warrant its use 
will depend on the cost of administer- 
ing the testing program, the adminis- 
trative feasibility of rejecting 21% 
of the men who passed the psychiatric 
screening, the cost to the Army of 
training the 225 maladaptive recruits, 
and the intangible human costs in- 
volved in psychiatric breakdown. 
Populations to which the scale is 
applied. In the evaluation of the 
efficiency of any psychometric in- 
strument, careful consideration must 
be given to the types of populations 
to which the device is to be applied. 
Danielson and Clark have stated 
that ‘‘since the final decision as to 
disposition is made by the psychia- 
trist, the test should be classified as 
a screening adjunct” (4, p. 138). 
This statement needs clarification, 
however, for the efficiency of the 
scale can vary markedly according 


PAUL E. MEEHL AND ALBERT ROSEN 


to the different ways in which it 
might be used as an adjunct. 

It will be noted that the test was 
administered to men who were al- 
ready in the Army, and not to men 
being examined for induction. The 
reported validation data apply, 
therefore, specifically to the popula- 
tion of recent inductees. The results 
might have been somewhat different 
if the population tested consisted of 
potential inductees. For the sake of 
illustration, however, let us assume 
that there is no difference in the test 
results of the two populations. 

An induction station psychiatrist 
can use the scale cutting score in 
one or more of the following ways, 
i.e., he can apply the scale results toa 
variety of populations. (a) The psy- 
chiatrist’s final decision to accept or 
reject a potential inductee may be 
based on both the test score and his 
usual interview procedure. The popu- 
lation to which the test scores are 
applied is, therefore, potential in- 
ductees interviewed by the usual pro- 
cedures for whom no decision was made. 
(6) He may evaluate the potential 
inductee according to his usual pro- 
cedures, and then consult the test 
score only if the tentative decision 
is to reject. That is, a decision to 
accept is final. The population to 
which the test scores are applied is 
potential inductees tentatively rejected 
by the usual interview procedures. (c) 
An alternative procedure is for the 
psychiatrist to consult the test score 
only if the tentative decision is to 
accept, the population being potential 
inductees tentatively accepted by the 
usual interview procedures. The de- 
cision to reject is final. (d) Probably 
the commonest proposal for the use 
of tests as screening adjuncts is that 
the more skilled and costly psychia- 
tric evaluation should be made only 
upon the test positives, i.e., induc- 
tees classified by the test as good 








risks are not interviewed, or are sub- 
jected only to a very short and super- 
ficial interview. Here the population 
is all potential inductees, the test being 
used to make either a final decision 
to ‘‘accept’’ or a decision to ‘‘exam- 
ine.”’ 

Among these different procedures, 
how is the psychiatrist to achieve 
maximum effectiveness in using the 
test as an adjunct? There is no an- 
swer to this question from the avail- 
able data, but it can be stated defi- 


nitely that the data reported by. 


Danielson and Clark apply only to 
the third procedure described above. 
The test results are based on a se- 
lected group of men accepted for in- 
duction and not on a random sample 
of potential inductees. If the scale 
is used in any other way than the 
third procedure mentioned above, the 
results may be considerably inferior 
to those reported, and, thus, to the 
use of the base rates without the 
test.” 

The principles discussed thus far, 
although illustrated by a single study, 
can be generalized to any study of 
predictive or concurrent validity. It 
can be seen that many considerations 
are involved in determining the 
efficiency of a scale at a given cut- 
ting score, especially the base rates of 
the subclasses within the population 
to which the psychometric device 
is to be applied. In a subsequent 
portion of this paper, methods will 
be presented for determining cutting 
points for maximizing the efficiency 
of the different types of decisions 
which are made with psychometric 
devices. 

Another study will be utilized to 
illustrate the importance of an explicit 
statement of the base rates of popu- 


* Goodman (8) has discussed this same 
problem with reference to the supplementary 
use of an index for the prediction of parole 
violation. 
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lation subgroups to be tested with a 
given device. Employing an interest- 
ing configural approach, Thiesen 
(18) discovered five Rorschach pat- 
terns, each of which differentiated 
well between 60 schizophrenic adult 
patients and a sample of 157 gainfully 
employed adults. The best differen- 
tiator, considering individual pat- 
terns or number of patterns, was 
Pattern A, which was found in 20% 
of the patients’ records and in only 
.6% of the records of normals. Thie- 
sen concludes that if these patterns 
stand the test of cross validation, 
they might have ‘‘clinical usefulness” 
in early detection of a schizophrenic 
process or as an aid to determining 
the gravity of an initial psychotic 
episode (18, p. 369). If by “‘clinical 
usefulness” is meant efficiency in a 
clinic or hospital for the diagnosis of 
schizophrenia, it is necessary to 
demonstrate that the patterns dif- 
ferentiate a higher percentage of 
schizophrenic patients from other 
diagnostic groups than could be cor- 
rectly classified without any test 
at all, i.e., solely on the basis of the 
rates of various diagnoses in any 
given hospital. If a test is to be used 
in differential diagnosis among psy- 
chiatric patients, evidence of its 
efficiency for this function cannot be 
established solely on the basis of dis- 
crimination of diagnostic groups from 
normals. If by “clinical usefulness’’ 
Thiesen means that his data indicate 
that the patterns might be used to 
detect an early schizophrenic process 
among nonhospitalized gainfully em- 
ployed adults, he would do better to 
discard his patterns and use the base 
rates, as can be seen from the follow- 
ing data. 

Taulbee and Sisson (17) cross vali- 
dated Thiesen’s patterns on schizo- 
phrenic patient and normal samples, 
and found that Pattern A was the 
best discriminator. Among patients, 
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8.1% demonstrated this pattern and 
among normals, none had this pat- 
tern. There are approximately 60 
million gainfully employed adults in 
this country, and it has been esti- 
mated that the rate of schizophrenia 
in the general population is approxi- 
mately .85% (2, p. 558). The results 
for Pattern A among a population 
of 10,000 gainfully employed adults 
would be as shown in Table 2. In 
order to detect 7 schizophrenics, it 
would be necessary to test 10,000 indi- 
viduals. 


TABLE 2 


NUMBER OF PERSONS CLASSIFIED AS SCHIZO- 
PHRENIC AND NoRMAL BY A TEST PATTERN 
AMONG A POPULATION OF GAINFULLY 
Emp_Lovep ApULTS 


(8.1% valid positives; 0.0% false positives) 

















Criterion Classification 
, Total 
Classifica- : - 
tion by Schizo- Normal | Classi- 
Test phrenia fied by 
Test 
No.| % |No. % 
Schizo- 
phrenia| 7/| 8.1 0 0 7 
Normal 78 | 91.9 |9,915|) 100 9,993 
Total in 
class | 81 100 {9,915} 100 | 10,000 




















In the Neurology service of a hospi- 
tal a psychometric scale is used which 
is designed to differentiate between 
patients with psychogenic and organ- 
ic low back pain (9). Ata given cut- 
ting point, this scale was found to 
classify each group with approxi- 
mately 70% effectiveness upon cross 
validation, i.e., 70% of cases with 
no organic findings scored above an 
optimal cutting score, and 70% of 
surgically verified organic cases 
scored below this line. Assume that 
90% of all patients in the Neurology 
service with a primary complaint of 
low back pain are in fact “organic.” 
Without any scale at all the psychol- 
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ogist can say every case is organic, 
and be right 90% of the time. With 
the scale the results would be as 
shown in Section A of Table 3. Of 
10 psychogenic cases, 7 score above 
the line; of 90 organic cases, 63 score 
below the cutting line. If every case 
above the line is called psychogenic, 
only 7 of 34 will be classified correctly 
or about 21%. Nobody wants to be 
right only one out of five times in this 
type of situation, so that it is ob- 
vious that it would be imprudent to 
call a patient psychogenic on the basis 
of this scale. Radically different 
results occur in prediction for cases 
below the cutting line. Of 66 cases 
63, or 95%, are correctly classified 
as organic. Now the psychologist 
has increased his diagnostic hits from 
90 to 95% on the condition that he 
labels only cases falling below the 
line, and ignores the 34% scoring 
above the line. 


TABLE 3 


NUMBER OF PATIENTS CLASSIFIED AS PsycHo- 
GENIC AND ORGANIC ON A Low Back Palin 
ScaLe Wuicu CLASSIFIES CORRECTLY 
70% oF PSYCHOGENIC AND ORGANIC 











CasEs 
Actual Diagnosis Total 
Classification Classified 
by Scale = Psycho- O . by 
genic mic Scale 





A. Base Rates in Population Tested: 
90% Organic; 10% Psychogenic 





Psychogenic 7 27 34 
Organic 3 63 66 
Totaldiagnosed 10 90 100 





B. Base Rates in Population Tested: 
90% Psychogenic; 10% Organic 





Psychogenic 63 3 66 
Organic 27 7 34 
Total diagnosed 90 10 100 





In actual practice, the psychologist 
may not, and most likely will not, 
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test every low back pain case. Prob- 
ably those referred for testing will be 
a select group, i.e., those who the 
neurologist believes are psychogenic 
because neurological findings are mini- 
mal or absent. This fact changes the 
population from “all patients in 
Neurology with a primary complaint 
of low back pain,” to “all patients 
in Neurology with a primary com- 
plaint of low back pain who are re- 
ferred for testing.’’ Suppose that a 
a study of past diagnoses indicated 
that of ‘patients with minimal or ab- 
sent findings, 90% were diagnosed as 
psychogenic and 10% as organic. 
Section B of Table 3 gives an entirely 
different picture of the effectiveness 
of the low back pain scale, and new 
limitations on interpretation are ne- 
cessary. Now the scale correctly 
classifies 95% of all cases above the 
line as psychogenic (63 of 66), and 
is correct in only 21% of all cases 
below the line (7 of 34). 
practical situation the psychologist 
would be wise to refrain from inter- 
preting a low score. 

From the above illustrations it 
can be seen that the psychologist in 
interpreting a test and in evaluating 
its effectiveness must be very much 
aware of the population and its 
subclasses and the base rates of the 
behavior or event with which he is 
dealing at any given time. 

It may be objected that no clini- 
cian relies on just one scale but would 
diagnose on the basis of a configura- 
tion of impressions from several 
tests, clinical data and history. We 
must, therefore, emphasize that the 
preceding single-scale examples were 
presented for simplicity only, but that 
the main point is not dependent upon 
this “atomism.”” Any complex con- 
figurational procedure in any number 
of variables, psychometric or otherwise, 
eventuates in a decision. Those de- 
cisions have a certain objective suc- 
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cess rate in criterion case identifica- 
tion; and for present purposes we 
simply treat the decision function, 
whatever its components and com- 
plexity may be, as a single variable. 
It should be remembered that the 
literature does not present us with 
cross-validated methods having hit 
rates much above those we have 
chosen as examples, regardless of 
how complex or configural the meth- 
ods used. So that even if the clinician 
approximates an extremely complex 
configural function “in his head” 
before classifying the patient, for 
purposes of the present problem this 
complex function is treated as the 
scale. In connection with the more 
general “philosophy” of clinical de- 
cision making see Bross (3) and Meehl 
(12). 


APPLICATIONS OF BAYES’ 
THEOREM 


Many readers will recognize the 
preceding numerical examples as 
essentially involving a principle of 
elementary probability theory, the 
so-called ‘‘Bayes’ Theorem.”” While 
it has come in for some opprobrium 
on account of its connection with 
certain pre-Fisherian fallacies in sta- 
tistical inference, as an algebraic 
statement the theorem has, of course, 
nothing intrinsically wrong with it 
and it does apply in the present case. 
One form of it may be stated as fol- 
lows: 

If there are k antecedent condi- 
tions under which an event of a given 
kind may occur, these conditions hav- 
ing the antecedent probabilities P;, 
P:,-++, Py of being realized, and 
the probability of the event upon 
each of them is pi, p2, Pa, +» Pes 
then, given that the event is observed 
to occur, the probability that it 
arose on the basis of a specified one, 
say j, of the antecedent conditions 
is given by 
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P spi 
fae | 


pa Pipi 
inl 


The usual illustration is the case 
of drawing marbles from an urn. 
Suppose we have two urns, and the 
urn-selection procedure is such that 
the probability of our choosing the 
first urn is 1/10 and the second 9/10. 
Assume that 70% of the marbles 
in the first urn are black, and 40% of 
those in the second urn are black. | 
now (blindfolded) ‘‘choose” an urn 
and then, from it, I choose a marble. 
The marble turns out to be black. 
What is the probability that I drew 
from the first urn? 


P= 10 
fi=.70 
Then 


Py= .9 
pa .40 


(.10)(.70) 
(.10)(.70)+(.90)(.40) ~ 


If I make a practice of inferring under 
such circumstances that an observed 
black marble arose from the first 
urn, I shall be correct in such judg- 
ments, in the long run, only 16.3% 
of the time. Note, however, that the 
“test item’ or ‘‘sign’’ black marble 
is correctly ‘‘scored”’ in favor of Urn 
No. 1, since there is a 30% difference 
in black marble rate between it and 
Urn No. 2. But this considerable 
disparity in symptom rate is over- 
come by the very low base rate 
(“antecedent probability of choosing 
from the first urn’’), so that inference 
to first-urn origin of black marbles 
will actually be wrong some 84 times 
in 100. In the clinical analogue, the 
urns are identified with the subpopu- 
lations of patients to be discriminated 
(their antecedent probabilities being 
equated to their base rates in the 
population to be examined), and the 
black marbles are test results of a 
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certain (‘‘positive’’) kind. The pro- 
portion of black marbles in one urn 
is the valid positive rate, and in the 
other is the false positive rate. In- 
spection and suitable manipulations 
of the formula for the common two- 
category case, viz., 


- see 
PpitOhr 


Pa) = Probability that an individ- 
ual is diseased, given that 
his observed test score is 
positive 

P = Base rate of actual positives 
in the population examined 

P+Q=1 

pi= Proportion of diseased iden- 
tified by test (‘‘valid posi- 
tive’ rate) 

na=1l—pi 

fa= Proportion of nondiseased 
misidentified by test as being 
diseased (‘‘false positive’’ 
rate) 

q2=1—psa 


yields several useful statements. Note 
that in what follows we are operating 
entirely with exact population param- 
eter values; i.e., sampling errors 
are not responsible for the dangers 
and restrictions set forth. See Table 
4. 

1. In order for a positive diagnostic 
assertion to be “more likely true than 
false,"’ the ratio of the positive to the 
negative base rates in the examined 
population must exceed the ratio of 
the false positive rate to the valid 
positive rate. That is, 

P 
Poh 
Q h 

If this condition is not met, the 
attribution of pathology on the basis 
of the test is more probably in error 
than correct, even though the sign 
being used is valid (i.e., piX ps2). 


Po) 
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TABLE 4 


DEFINITION OF SYMBOLS 








Diagnosis 


from 
Test 


Actual Diagnosis 





Positive 


Negative 





Positive 


pri 
Valid positive 
rate (Proportion 
of positives 
called positive) 


pr 
False positive 
rate (Proportion 
of negatives 
called positive) 





Negative 


nu 
False negative 
rate (Proportion 
of positives 
called negative) 


q2 
Valid negative 
rate (Proportion 
of negatives 
called negative) 


Total fitn=1.0 
with actu- (Total _posi- 
al diag- tives) 
nosis 





Ps +4: =1.0 
(Total nega- 
tives) 





Note.—For simplicity, the term “diagnosis” is used 


to denote the classification of any kind of pathology, be- 
havior, or event being studied, or to denote “‘outcome” 
if a test is used for prediction. Since horizontal addition 
(e.g., 21 +2) is meaningless in ignorance of the base 
rates, there is no symbol or marginal total for these 
sums. All values are parameter values. 

Example: If a certain cutting score 
identifies 80% of patients with organ- 
ic brain damage (high scores being 
indicative of damage) but is also ex- 
ceeded by 15% of the nondamaged 
sent for evaluation, in order for the 
psychometric decision ‘‘brain dam- 
age present’’ to be more often true 
than false, the ratio of actually brain- 
damaged to nondamaged cases among 
all seen for testing must be at least 
one to five (.19). 

Piotrowski has recommended that 
the presence of 5 or more Rorschach 
signs among 10 “‘organic’”’ signs is an 
efficient indicator of brain damage. 
Dorken and Kral (5), in cross validat- 
ing Piotrowski’s index, found that 
63% of organics and 30% of a mixed, 
nonorganic, psychiatric patient group 
had Rorschachs with 5 or more 
signs. Thus, our estimate of p2/p: 
= .30/.63 =.48, and in order for the 
decision “‘brain damage present”’ to 
be correct more than one-half the 
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time, the proportion of positives (P) 
in a given population must exceed 
33 (i.e., P/Q>.33/.67). Since few 
clinical populations requiring this 
clinical decision would have such a 
high rate of brain damage, especially 
among psychiatric patients, the par- 
ticular cutting score advocated by 
Piotrowski will produce an excessive 
number of false positives, and the 
positive diagnosis will be more often 
wrong than right. Inasmuch as the 
base rates for any given behavior or 
pathology differ from one clinical 
setting to another, an inflexible cut- 
ting score should not be advocated for 
any psychometric device. This state- 
ment applies generally—thus, to 
indices recommended for such di- 
verse purposes as the classification 
or detection of deterioration, specific 
symptoms, ‘‘traits,’’ neuroticism, sex- 
ual aberration, dissimulation, sui- 
cide risk, and the like. When P is 
small, it may be advisable to explore 
the possibility of dealing with a re- 
stricted population within which the 
base rate of the attribute being tested 
is higher. This approach is discussed 
in an article by Rosen (14) on the 
detection of suicidal patients in which 
it is suggested that an attempt might 
be made to apply an index to sub- 
populations with higher suicide rates. 
2. If the base rates are equal, the 
probability of a positive diagnosis 
being correct is the ratio of valid 
positive rate to the sum of valid 
and false positive rates. That is, 


if P=Q=}. 


Paw)* 
1 


Example: If our population is 
evenly divided between neurotic and 
psychotic patients the condition for 
being ‘‘probably right” in diagnosing 
psychosis by a certain method is 
simply that the psychotics exhibit 
the pattern in question more fre- 
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quently than the neurotics. This is 
the intuitively obvious special case; 
it is often misgeneralized to justify 
use of the test in those cases where 
base-rate asymmetry (P#Q) coun- 
teracts the (p; — p2) discrepancy, lead- 
ing to the paradoxical consequence 
that deciding on the basis of more in- 
formation can actually worsen the 
chances of a correct decision. The 
apparent absurdity of such an idea 
has often misled psychologists into 
behaving as though the establish- 
ment of “‘validity’’ or “‘discrimina- 
tion,” ie., that pi¥p2, indicates 
that a procedure should be used in 
decision making. 

Example: A certain test is used 
to select those who will continue in 
outpatient psychotherapy (positives). 
It correctly identifies 75% of these 
good cases but the same cutting 
score picks up 40% of the poor risks 
who subsequently terminate against 
advice. Suppose that in the past 


experience of the clinic 50% of the 


patients terminated therapy pre- 
maturely. Correct selection of pa- 
tients can be made with the given 
cutting score on the test 65% of the 
time, since :/(pi+p2) =.75/(.75 
+.40)=.65. It can be seen that the 
efficiency of the test would be exag- 
gerated if the base rate for continua- 
tion in therapy were actually .70, 
but the efficiency were evaluated 
solely on the basis of a research study 
containing equal groups of continuers 
and noncontinuers, i.e., if it were 
assumed that P =.50. 

3. In order for the hits in the en- 
tire population which is under con- 
sideration to be increased by use of 
the test, the base rate of the more 
numerous class (called here positive) 
must be less than the ratio of the 
valid negative rate to the sum of 
valid negative and false negative 
rates. That is, unless 
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poe 


atq 


the making of decisions on the basis 
of the test will have an adverse 
effect. An alternative expression is 
that (P/Q) <(¢2/q) when P>Q, i.e., 
the ratio of the larger to the smaller 
class must be less than the ratio of 
the valid negative rate to the false 
negative rate. When P<Q, the con- 
ditions for the test to improve upon 
the base rates are: 


fi 
<- 
° pit pr 


Om. 


Pps 


Rotter, Rafferty, and Lotsof (15) 
have reported the scores on a sentence 
completion test for a group of 33 
“maladjusted” and 33 ‘‘adjusted”’ 
girls. They report that the use of a 
specified cutting score (not cross 
validated) will result in the correct 
classification of 85% of the malad- 
justed girls and the incorrect classifi- 
cation of only 15% of the adjusted 
girls. It is impossible to evaluate 
adequately the efficiency of the test 
unless one knows the base rates of 
maladjustment (P) and adjustment 
(Q) for the population of high school 
girls, although there would be general 
agreement that Q>P. Since pi/(pi 
+ p2) =.85/(.854.15) =.85, the over- 
all hits in diagnosis with the test 
will not improve on classification 
based solely on the base rates unless 
the proportion of adjusted girls is 
less than .85. Because the reported 
effectiveness of the test is spuriously 
high, the proportion of adjusted 
girls would no doubt have to be 
considerably less than .85. Unless 
there is good reason to believe that 
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the base rates are similar from one 
setting to another, it is impossible to 
determine the efficiency of a test 
such as Rotter’s when the criterion is 
based on ratings unless one replicates 
his research, including the criterion 
ratings, with a representative sample 
of each new population. 

4. In altering a sign, improving a 
scale, or shifting a cutting score, the 
increment in valid positives per incre- 
ment in valid positive rate is propor- 
tional to the positive base rate; and 
analogously, the increment in valid 
negatives per increment in valid 
negative rate is proportional to the 
negative base rate. That is, if we 
alter a sign the net improvement in 
over-all hit rate is 


H'r— Hr=Ap; P+Aq0, 


where Hr=original proportion of 
hits (over-all) and H’r=new propor- 
tion of hits (over-all). 

5. A corollary of this is that alter- 
ing a sign or shifting a cut will im- 
prove our decision making if, and 
only if, the ratio of improvement 
Ap, in valid positive rate to worsening 
Apz in false negative rate exceeds the 
ratio of actual negatives to positives 
in the population. 

Ap Q 


Ap P 


Example: Suppose we improve the 
intrinsic validity of a certain “‘schizo- 
phrenic index’’ so that it now de- 
tects 20% more schizophrenics than 
it formerly did, at the expense of 
only a 5% increase in the false posi- 
tive rate. This surely looks encourag- 
ing. We are, however, working with 
an outpatient clientele only 1/10th 
of whom are actually schizophrenic. 
Then, since 


Ap: =.20 
Ap. =.05 


P=.10 
Q=.90 
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applying the formula we see that 


20.90 

—— > —- 

05 10 
i.e., the required inequality does not 
hold, and the routine use of this 
“improved” index will result in an 
increase in the proportion of errone- 
ous diagnostic decisions. 

In the case of any pair of unimodal 
distributions, this corresponds to the 
principle that the optimal cut lies 
at the intersection of the two distribu- 
tion envelopes (11, pp. 271-272). 


MANIPULATION OF CUTTING 
LINES FOR DIFFERENT 
DECISIONS 


For any given psychometric de- 
vice, no one cutting line is maximally 
efficient for clinical settings in which 
the base rates of the criterion groups 
in the population are different. Fur- 
thermore, different cutting lines may 
be necessary for various decisions 
within the same population. In this 
section, methods are presented for 
manipulating the cutting line of any 
instrument in order to maximize 
the efficiency of a device in the mak- 
ing of several kinds of decisions. 
Reference should be made to the 
scheme presented in Table 5 for un- 
derstanding of the discussion which 
follows. This scheme and the meth- 
ods for manipulating cutting lines 
are derived from Duncan, Ohlin, 
Reiss, and Stanton (6). 

A study in the prediction of juve- 
nile delinquency by Glueck and Glueck 
(7) will be used for illustration. 
Scores on a prediction index for 451 
delinquents and 439 nondelinquents 
(7, p. 261) are listed in Table 6. If 
the Gluecks’ index is to be used in a 
population with a given juvenile 
delinquency rate, cutting lines can 
be established to maximize the effi- 
ciency of the index for several de- 
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TABLE 5 


SymBo_s TO Be Usep in EVALUATING THE EFFICIENCY OF A PSYCHOMETRIC DEVICE 
IN CLASSIFICATION OR PREDICTION 








Diagnosis 


Actual Diagnosis 


Total Diagnosed 





from 


Test Positive 


from 


Negative Test 





NPp, 
(Number of valid posi- 
tives) 


Positive 


(Number of false posi- 
tives) 


NO: NPpi+NOQp2 
(Number of test posi- 
tives) 





NP 
(Number of false nega- 
tives) 


Negative 


(Number of valid nega- 
tives) 


NOQOq NPa+NQqe 
(Number of test nega- 


tives) 





NP 
(Number of actual pos- 
itives) 


Total with actual 
diagnosis 


(Number of actual neg- 
atives) 


NQ 


N 
(Total number of cases) 





Note.—For simplicity, the term “diagnosis” is used to denote the classification of any kind of pathology, be- 
havior, or event studied, or to denote “outcome” if a test is used for prediction. ‘‘Number” means absolute frequency, 


not rate or probability. : 


cisions. In the following illustration, 
a delinquency rate of .20 will be used. 
From the data in Table 6, optimal 
cutting lines will be determined for 
maximizing the proportion of correct 
predictions, or hits, for all cases 
(Hr), and for maximizing the pro- 
portion of hits (Hp) among those 
called delinquent (positives) by the 
index. 

In the first three columns of Table 
6, ‘‘f’’ denotes the number of de- 
linquents scoring in each class inter- 
val, “cf” represents the cumulative 


frequency of delinquents scoring 
above each class interval (e.g., 265 
score above 299), and p; represents 
the proportion of the total group 
of 451 delinquents scoring above 
each class interval. Columns 4, 5, 
and 6 present the same kind of data 
for the 439 nondelinquents. 
Maximizing the number of correct 
predictions or classifications for all 
cases. The proportion of correct 
predictions or classifications (Hr) for 
any given cutting line is given by the 
formula, Hr=Ppi:+Qq. Thus, in 


TABLE 6 


PREDICTION INDEX SCORES FOR JUVENILE DELINQUENTS AND NONDELINQUENTS AND 
OTHER STATISTICS FOR DETERMINING OPTIMAL CUTTING LINES FoR CERTAIN 
DECISIONS IN A POPULATION WITH A DELINQUENCY RATE OF .20* 








Delinquents Nondelinquents 








cf cf 


451 


.8p d Port Prt Pp: 
- Qas Op: - 


Rp 





(1) 


(8) (9) (11) (12) (13) 





f 


Pp. Ops Hp 





51 
73 
141 
122 
40 
19 
<150 5 451 


-9977 
.9795 
-9271 
-7677 
-6128 
. 3804 
-0000 


-0226 .0018 . . ¢ 

0550 .0164 .7836 . . -770 
-1175 .0583 . ° ° -668 
-1716 .1858 . . . .480 
-1894 .3098 . . . .379 
-1978 .4957 . ° ° -285 
-2000 .8000 . ° ° -200 





* Frequencies in columns 1 and 4 are from Glueck and Glueck (7, p. 261). 
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column 11 of Table 6, labelled Hr, 
it can be seen that the best cutting 
line for this decision would be be- 
tween 299 and 300, for 85.9% of all 
predictions would be correct if those 
above the line were predicted to 
become delinquent and all those be- 
low the line nondelinquent. Any 
other cutting line would result in a 
smaller proportion of correct predic- 
tions, and, in fact, any cutting line 
set lower than this point would make 
the index inferior to the use of the 
base rates, for if all cases were pre- 
dicted to be nondelinquent, the total 
proportion of hits would be .80. 

Maximizing the number of correct 
predictions or classifications for posi- 
tives. The primary use of a prediction 
device may be for selection of (a) 
students who will succeed in a train- 
ing program, (5) applicants who will 
succeed in a certain job, (c) patients 
who will benefit from a certain type 
of therapy, etc. In the present illus- 
tration, the index would most likely 
be used for detection of those who are 
likely to become delinquents. Thus, 
the aim might be to maximize the 
number of hits only within the 
group predicted by the index to be- 
come delinquents (predicted posi- 
tives=NPpit+NQp:). The propor- 
tion of correct predictions for this 
group by the use of different cutting 
lines is given in column 13, labelled 
Hp. Thus, if a cutting line is set 
between 399 and 400, one will be 
correct over 92 times in 100 if pre- 
dictions are made only for persons 
scoring above the cutting line. The 
formula for determining the efficiency 
of the test when only positive predic- 
tions are made is Hp=Pp,/(Pp, 
+Qpz). 

One has to pay a price for achieving 
a very high level of accuracy with 
the index. Since the problem is to 
select potential delinquents so that 
some sort of therapy can be at- 
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tempted, the proportion of this 
selected group in the total sample 
may be considered as a selection 
ratio. The selection ratio for positives 
is Rp=Pp,+Qps, that is, predictions 
are made only for those above the 
cutting line. The selection ratio for 
each posssible cutting line is shown 
in column 12 of Table 6, labelled 
Rp. It can be seen that to obtain 
maximum accuracy in selection of 
delinquents (92.6%), predictions can 
be made for only 2.4% of the popula- 
tion. For other cutting lines, the 
accuracy of selection and the cor- 
responding selection ratios are given 
in Table 6. The worker applying the 
index must use his own judgment in 
deciding upon the level of accuracy 
and the selection ratio desired. 

Maximizing the number of correct 
predictions or classifications for nega- 
tives. In some selection problems, the 
goal is the selection of negatives 
rather than positives. Then, the pro- 
portion of hits among all predicted 
negative for any given cutting line 
is Hy = Qq2/(Qq2:+ Pq), and the selec- 
tion ratio for negatives is Ry= Pq, 
+042. 

In all of the above manipulations of 
cutting lines, it is essential that there 
be a large number of cases. Other- 
wise, the percentages about any given 
cutting line would be so unstable that 
very dissimilar results would be ob- 
tained on new samples. For most 
studies in clinical psychology, there- 
fore, it would be necessary to estab- 
lish cutting lines according to the 
decisions and methods discussed 
above, and then to cross validate a 
specific cutting line on new samples. 

The amount of shrinkage to be 
expected in the cross validation of 
cutting lines cannot be determined 
until a thorough mathematical and 
statistical study of the subject is 
made. It may be found that when 
criterion distributions are approxi- 
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TABLE 7 


PERCENTAGE OF DELINQUENTS (D) AND NONDELINQUENTS (ND) rn Eacu PREDICTION 
Inpex Score INTERVAL IN A POPULATION IN WHICH THE DELINQUENCY Rate Is .20* 








Prediction 
Index No. of 
Score D 
Interval 


% of D 
and ND 
in Score 
Interval 


% of D 
in Score 
Interval 


% of ND 
in Score 
Interval 


No. of 
ND 


Total of 
D and ND 





400+ 
350-399 
300-349 
250-299 
200-249 40 
150-199 19 
<150 5 


51 4 
33 
95 

288 

279 

419 

686 


55 
106 
236 
410 
319 
438 
691 


100 
100 
100 
100 
100 
100 
100 


122 


SRSSELSA 
be Sa tnt be me be 


Total 451 1804 2255 





* Modification of Table XX-2, p. 261, from Glueck and Glueck (7). 


mately normal and large, cutting 
lines should be established in terms 
of the normal probability table 
rather than on the basis of the ob- 
served p and g values found in the 
samples. In a later section dealing 


with the selection ratio we shall see 
that it is sometimes the best proced- 


ure to select all individuals falling 
above a certain cutting line and to 
select the others needed to reach the 
selection ratio by choosing at random 
below the line; or in other cases to 
establish several different cuts de- 
fining ranges within which one or 
the opposite decision should be 
made. 

Decisions based on score intervals 
rather than cutting lines. The Gluecks’ 
data can be used to illustrate another 
approach to psychometric classifica- 
tion and prediction when scores for 
large samples are available with a 
relatively large number of cases in 
each score interval. In Table 7 are 
listed frequencies of delinquents and 
nondelinquents for prediction index 
score intervals. The frequencies for 
delinquents are the same as those in 
Table 6, whereas those for nondelin- 
quents have been corrected for a 
base rate of .20 by multiplying each 


frequency in column 4 of Table & 
by 


(.80) (459) 


(.20) (431) 


Table 7 indicates the proportion of 
delinquents and_ nondelinquents 
among all juveniles who fall within 
a given score interval when the 
base rate of delinquency is .20. It 
can be predicted that of those scoring 
400 or more, 92.7% will become de- 
linquent, of those scoring between 350 
and 399, 68.9% will be delinquent, 
etc. Likewise, of those scoring be- 
tween 200 and 249, it can be predicted 
that 87.5% will not become delin- 
quent. Since 80% of predictions will 
be correct without the index if all 
cases are called nondelinquent, one 
would not predict nondelinquency 
with the index in score intervals over 
249. Likewise, it would be best not 
to predict delinquency for individuals 


* The Gluecks’ Tables XX-2, 3, 4, 5, (7, 
pp. 261-262) and their interpretations there- 
from are apt to be misleading because of their 
exclusive consideration of approximately 
equal base rates of delinquency and non- 
delinquency. Reiss (13), in his review of the 
Gluecks’ study, has also discussed their use 
of an unrepresentative rate of delinquency. 
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in the intervals under 250 because 
20% of predictions will be correct if 
the base rate is used. 

It should be emphasized that there 
are different ways of quantifying 
one’s clinical errors, and they will, 
of course, not all give the same evalu- 
ation when applied in a given setting. 
“Per cent valid positives’”’ (= ,) is 
rarely if ever meaningful without the 
correlated ‘‘per cent false positives’ 
(= 2), and clinicians are accustomed 
to the idea that we pay for an in- 
crease in the first by an increase in 
the second, whenever the increase is 
achieved not by an improvement in 
the test’s intrinsic validity but by a 
shifting of the cutting score. But the 
two quantities p, and pf. do not de- 
fine our over-all hit frequency, which 
depends also upon the base rates 
P and Q. The three quantities py, 
fo, and P do, however, contain all 
the information needed to evaluate 
the test with respect to any given 
sign or cutting score that yields these 
values. Although p;, p2, and P con- 
tain the relevant information, other 
forms of it may be of greater impor- 
tance. No two of these numbers, for 
example, answer the obvious ques- 
tion most commonly asked (or vague- 
ly implied) by psychiatrists when an 
inference is made from a sign, viz., 
“‘How sure can you be on the basis 
of that sign?’”’ The answer to this 
eminently practical query involves 
a probability different from any of 
the above, namely, the inverse prob- 
ability given by Bayes’ formula: 


P 
Hp= ES 
PpitQpa 


Even a small improvement in the 
hit frequency to H’r = Pp,+ Qq over 
the Hr=P attainable without the 
test may be adjudged as worth while 
when the increment AHr is multi- 
plied by the N examined in the course 
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of one year and is thus seen to in- 
volve a dozen lives or a dozen curable 
schizophrenics. On the other hand, 
the simple fact that an actual shrink- 
age in total hit rate may occur seems 
to be unappreciated or tacitly ignored 
by a good deal of clinical practice. 
One must keep constantly in mind 
that numerous diagnostic, prognostic, 
and dynamic statements can be made 
about almost all neurotic patients 
(e.g., ““depressed,”’ “inadequate abil- 
ity to relate,”’ “‘sexual difficulties’) 
or about very few patients (e.g., 
“dangerous,” “will act out in ther- 
apy,” “suicidal,” ‘will blow up into 
a schizophrenia”). A_ psychologist 
who uses a test sign that even cross 
validates at p:=q.=80% to deter- 
mine whether ‘‘depression”’ is present 
or absent, working in a clinical popu- 
lation where practically everyone is 
fairly depressed except a few psy- 
chopaths and old-fashioned hysterics, 
is kidding himself, the psychiatrist, 
and whoever foots the bill. 


“‘SUCCESSIVE-HURDLES”’ 
APPROACH 


Tests having low efficiency, or 
having moderate efficiency but ap- 
plied to populations having very un- 
balanced base rates (PQ) are some- 
times defended by adopting a “crude 
initial screening’ frame of reference, 
and arguing that certain other pro- 
cedures (whether tests or not) can 
be applied to the subset identified 
by the screener (‘‘successive hur- 
dies’). There is no question that in 
some circumstances (e.g., military 
induction, or industrial selection with 
a large labor market) this is a 
thoroughly defensible position. How- 
ever, as a general rule one should 
examine this type of justification 
critically, with the preceding con- 
siderations in mind. Suppose we 
have a test which distinguishes brain- 
tumor from non-brain-tumor pa- 
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tients with 75% accuracy and no 
differential bias (p;=q,=.75). Un- 
der such circumstances the test hit 
rate Hr is .75 regardless of the base 
rate. If we use the test in making 
our judgments, we are correct in our 
diagnoses 75 times in 100. But sup- 
pose only one patient in 10 actually 
has a brain tumor, we will drop our 
over-all ‘‘success’’ from 90% (at- 
tainable by diagnosing ‘‘No tumor”’ 
in all cases) to 75%. We do, however, 
identify 3 out of 4 of the real brain 
tumors, and in such a case it seems 
worth the price. The “‘price’’ has 
two aspects to it: We take time 
to give the test, and, having given 
it, we call many ‘‘tumorous’’ who are 
not. Thus, suppose that in the course 
of a year we see 1000 patients. Of 
these, 900 are non-tumor, and we 
erroneously call 225 of these ‘‘tumor.”’ 
To pick up (100) (.75)=75 of the 
tumors, all 100 of whom would have 
been called tumor-free using the 
base rates alone, we are willing to 
mislabel 3 times this many as tumor- 
ous who are actually not. Putting it 
another way, whenever we say 
“tumor” on the basis of the test, the 
chances are 3 to 1 that we are mis- 
taken. When we “rule out’’ tumor 
by the test, we are correct 96% of 
the time, an improvement of only 6% 
in the confidence attachable to a 
negative finding over the confidence 
yielded by the base rates.‘ 

Now, picking up the successive- 
hurdles argument, suppose a major 
decision (e.g., exploratory surgery) 
is allowed to rest upon a second test 


‘Improvements are expressed throughout 
this article as absolute increments in percent- 
age of hits, because: (a) This avoids the com- 
plete arbitrariness involved in choosing be- 
tween original hit rate and miss rate as start- 
ing denominator; and (b) for the clinician, 
the person ig the most meaningful unit of 
gain, rather than a proportion of a proportion 
(especially when the reference proportion is 
very small), 
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which is infallible but for practically 
insuperable reasons of staff, time, 
etc., cannot be routinely given. We 
administer Test 2 only to ‘‘positives”’ 
on (screening) Test1. By this tactic 
we eliminate all 225 false positives 
left by Test 1, and we verify the 75 
valid positives screened in by Test 
1. The 25 tumors that slipped 
through as false negatives on Test 1 
are, of course, not picked up by Test 
2 either, because it is not applied to 
them. Our total hit frequency is now 
97.5%, since the only cases ultimate- 
ly misclassified out of our 1000 seen 
are these 25 tumors which escaped 
through the initial sieve Test 1. We 
are still running only 74% above the 
base rate. We have had to give our 
short-and-easy test to 1000 indi- 
viduals and our cumbersome, expen- 
sive test to 300 individuals, 225 
of whom turn out to be free of tumor. 
But we have located 75 patients with 
tumor who would not otherwise 
have been found. 

Such examples suggest that, ex- 
cept in “‘life-or-death’’ matters, the 
successive-screenings argument mere- 
ly tends to soften the blow of Bayes’ 
Rule in cases where the base rates are 
very far from symmetry. Also, if 
Test 2 is not assumed to be infallible 
but only highly effective, say 90% 
accurate both ways, results start 
looking unimpressive again. Our net 
false positive rate rises from zero to 
22 cases miscalled ‘‘tumor,”’ and we 
operate 67 of the actual tumors in- 
stead of 75. The total hit frequency 
drops to 94.5%, only 44% above 
that yielded by a blind guessing of 
the modal class. 


THE SELECTION RATIO 


Straightforward application of the 
preceding principles presupposes that 
the clinical decision maker is free to 
adopt a policy solely on the basis of 
maximizing hit frequency. Some- 
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times there are external constraints 
such as staff time, administrative 
policy, or social obligation which 
further complicate matters. It may 
then be impossible to make all de- 
cisions in accordance with the base 
rates, and the task given to the test 
is that of selecting a subset of cases 
which are decided in the direction 
opposite to the base rates but will 
still contain fewer erroneous decisions 
than would ever be yielded by oppos- 
ing the base rates without the test. 
If 80% of patients referred to a 
Mental Hygiene Clinic are recover- 
able with intensive psychotherapy, 
we would do better to treat every- 
body than to utilize a test yielding 
75% correct predictions. But sup- 
pose that available staff time is 
limited so that we can treat only half 
the referrals. The Bayes-type injunc- 
tion to “follow the base rates when 
they are better than the test’’ be- 
comes pragmatically meaningless, for 
it directs us to make decisions which 
we cannot implement. The imposi- 
tion of an externally imposed selec- 
tion ratio, not determined on the 
basis of any maximizing or mini- 
mizing policy but by nonstatistical 
considerations, renders’ the 
worth while. 

Prior to imposition of any arbi- 
trary selection ratio, the fourfold 
table for 100 referrals might be as 
shown in Table 8. If the aim were 
simply to minimize total errors, we 


test 


TABLE 8 


ACTUAL AND TEST-PREDICTED 
THERAPEUTIC OUTCOME 





Test 
Pre- — 
diction 


Therapeutic Outcome 





Total 
Good 60 5 65 
Poor : 20 15 35 

Total 80 20 


Good Poor 
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would predict “‘good”’ for each case 
and be right 80 times in 100. Using 
the test, we would be right only 75 
times in 100. But suppose a selec- 
tion ratio of .5 is externally imposed. 
We are then forced to predict “poor”’ 
for half the cases, even though this 
“prediction” is, in any given case, 
likely to be wrong. (More precisely, 
we handle this subset as if we pre- 
dicted ‘‘poor,” by refusing to treat.) 
So we now select our 50 to-be-treated 
cases from among those 65 who fall 
in the “test-good” array, having a 
frequency of 60/65=92.3% hits 
among those selected. This is better 
than the 80% we could expect 
(among those selected) by choosing 
half the total referrals at random. 
Of course we pay for this, by making 
many “false negative’ decisions; 
but these are necessitated, whether we 
use the test or not, by the fact that 
the selection ratio was determined 
without regard for hit maximization 
but by external considerations. With- 
out the test, our false negative rate 
q is 50% (i.e., 40 of the 80 “good” 
cases will be called “poor’’); the test 
reduces the false negative rate to 
42.5% (=34/80), since 15 cases from 
above the cutting line must be 
selected at random for inclusion in 
the not-to-be-treated group below 
the cutting line [i.e., 20+ (60/65)15 
= 34]. Stated in terms of correct 
decisions, without the test 40 out 
of 50 selected for therapy will have a 
good therapeutic outcome; with the 
test, 46 in 50 will be successes. 
Reports of studies in which for- 
mulas are developed from psycho- 
metrics for the prediction of patients’ 
continuance in psychotherapy have 
neglected to consider the relationship 
of the selection ratio to the specific 
population to which the prediction 
formula is to be applied. In each 
study the population has consisted 
of individuals who were accepted for 








210 


therapy by the usual methods em- 
ployed at an outpatient clinic, and 
the prediction formula has _ been 
evaluated only ior such patients. It 
is implied by these studies that the 
formula would have the same effi- 
ciency if it were used for the selection 
of “continuers’”’ from all those ap- 
plying for therapy. Unless the for- 
mula is tested on a random sample 
of applicants who are allowed to enter 
therapy without regard to their test 
scores, its efficiency for selection pur- 
poses is unknown. The reported 
efficiency of the prediction formula 
in the above studies pertains only to 
its use in a population of patients 
who have already been selected for 
therapy. There is little likelihood 


that the formula can be used in any 
practical way for further selection of 
patients unless the clinic’s therapists 
are carrying a far greater load than 
they plan to carry in the future. 

The use of the term “selection” 
(as contrasted with ‘‘prediction’’ or 


‘‘placement”’) ought not to blind us 
to the important differences between 
industrial selection and its clinical 
analogue. The incidence of false 
negatives—of potential employees 
screened out by the test who would 
actually have made good on the job 
if hired—is of little concern to man- 
agement except as it costs money to 
give tests. Hence the industrial 
psychologist may choose to express 
his aim in terms of minimizing the 
false positives, i.e., of seeing to it 
that the job success among those hired 
is as large a rate as possible. When 
we make a clinical decision to treat 
or not to treat, we are withholding 
something from people who have a 
claim upon us in a sense that is much 
stronger than the “right to work’”’ 
gives a job applicant any claim upon 
a particular company. So, even 
though we speak of a “‘selection ratio”’ 
in clinical work, it must be remem- 
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bered that those cases not selected are 
patients about whom a certain kind 
of important negative decision is 
being made. 

For any given selection ratio, maxi- 
mizing total hits is always equivalent 
to maximizing the hit rate for either 
type of decision (or minimizing the 
errors of either, or both, kinds), since 
cases shifted from one cell of the table 
have to be exactly compensated for. 
If m “good’’ cases that were cor- 
rectly classified by one decision 
method are incorrectly classified by 
another, maintenance of the selection 
ratio entails that m cases correctly 
called ‘‘poor’’ are also miscalled 
“good”’ by the new method. Hence 
an externally imposed selection ra- 
tio eliminates the often troublesome 
value questions about the relative 
seriousness of the two kinds of errors, 
since they are unavoidably increased 
or decreased at exactly the same rate. 

If the test yields a score or a con- 
tinuously varying index of some 
kind, the values of p; and p2 are not 
fixed, as they may be with ‘‘patterns”’ 
or “‘signs."’ Changes in the selection 
ratio, R, will then suggest shifting 
the cutting scores or regions on the 
basis of the relations obtaining among 
R, P, and the pi, p2 combinations 
yielded by various cuts. It is worth 
special comment that, in the case of 
continuous distributions, the opti- 
mum procedure is mot always to 
move the cut until the total area 
truncated = NR, selecting all above 
that cut and rejecting all those below. 
Whether this “obvious” rule is wise 
or not depends upon the distribution 
characteristics. We have found it 
easy to construct pairs of distribu- 
tions such that the test is ‘‘discrimi- 
nating’ throughout, in the sense that 
the associated cumulative frequencies 
gq: and g2 maintain the same direction 
of their inequality everywhere in the 
range 
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yet in which the hit frequency given 
by a single cut at R is inferior to that 
given by first selecting with a cut 
which yields N,<NR, and then 
picking up the remaining (VR—WN,) 
cases at random below the cut. Other 
more complex situations may arise 
in which different types of decisions 
should be made in different regions, 
actually reversing the policy as we 
move along the test continuum. Such 
numerical examples as we have con- 
structed utilize continuous, unimodal 
distributions, and involve differences 
in variability, skewness, and kurtosis 
not greater than those which arise 
fairly often in clinical practice. Of 
course the utilization of any very 
complicated pattern of regions re- 
quires more stable distribution fre- 
quencies than are obtainable from the 
sample sizes ordinarily available to 
clinicians. 

It is instructive to contemplate 
some of the moral and administra- 
tive issues involved in the practical 
application of the preceding ideas. It 
is our impression that a good deal of 
clinical research is of the “So— 
what?” variety, not because of de- 
fects in experimental design such as 
inadequate cross validation but be- 
cause it is hard to see just what are 
the useful changes in decision making 
which could reasonably be expected 
to follow. Suppose, for example, it is 
shown that “duration of psycho- 
therapy” is 70% predictable from a 
certain test. Are we prepared to pro- 
pose that those patients whose test 
scores fall in a certain range should 
not receive treatment? If not, then 
is it of any real advantage therapeuti- 
cally to “‘keep in mind”’ that the pa- 
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tient has 7 out of 10 chances of stay- 
ing longer than 15 hours, and 3 out 
of 10 chances of staying less than 
that? We are not trying to poke 
fun at research, since presumably 
almost any lawful relationship stands 
a chance of being valuable to our 
total scientific comprehension some 
day. But many clinical papers are 
ostensibly inspired by practical aims, 
and can be given theoretical inter- 
pretation or fitted into any larger 
framework only with great difficulty 
if at all. It seems appropriate to 
urge that such ‘“‘practical’’-oriented 
investigations should be really prac- 
tical, enabling us to see how our 
clinical decisions could rationally be 
modified in the light of the findings. 
It is doubtful how much of current 
work could be justified in these 
terms. 

Regardless of whether the test 
validity is capable of improving on 
the base rates, there are some pre- 
diction problems which have practical 
import only because of limitations in 
personnel. What other justification 
is there for the great emphasis in 
clinical research on ‘‘prognosis,”’ 
“‘treatability,” or “stayability’’? The 
very formulation of the predictive 
task as ‘‘maximizing the number of 
hits’ already presupposes that we 
intend not to treat some cases; since 
if we treat all comers, the ascertain- 
ment of a bad prognosis score has 
no practical effect other than to 
discourage the therapist (and thus 
hinder therapy?). If intensive psy- 
chotherapy could be offered to all 
veterans who are willing to accept 
referral to a VA Mental Hygiene 
Clinic, would it be licit to refuse 
those who had the poorest outlook? 
Presumably not. It is interesting to 
contrast the emphasis on prognosis 
in clinical psychology with that in, 
say, cancer surgery, where the treat- 
ment of choice may still have a very 
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low probability of ‘‘success,” but is 
nevertheless carried out on the basis 
of that low probability. Nor does 
this attitude seem unreasonable, since 
no patient would refuse the best 
available treatment on the ground 
that even it was only 10% effective. 
Suppose a therapist, in the course 
of earning his living,. spends 200 
hours a year on nonimprovers by 
following a decision policy that also 
results in his unexpected success with 
one 30-year-old ‘‘poor bet.”’ If this 
client thereby gains 1636540 
= 233,600 hours averaging 50% less 
anxiety during the rest of his natural 
life, it was presumably worth the 
price. 

These considerations suggest that, 
with the expansion of professional 
facilities in the behavior field, the 
prediction problem will be less like 
that of industrial selection and more 
like that of placement. ‘‘To treat or 
not to treat’’ or ‘‘How treatable”’ 
or “How long to treat’’ would be 


replaced by “What kind of treat- 
ment?’’ But as soon as the problem 
is formulated in this way, the external 
selection ratio is usually no longer 


imposed. Only if we are deciding 
between such alternatives as classical 
analysis and, say, 50-hour interpreta- 
tive therapy would such personnel 
limitations as can be expected in 
future years impose an arbitrary R. 
But if the decision is between such 
alternatives as short-term interpreta- 
tive therapy, Rogerian therapy, 
Thorne’s directive therapy, hypnotic 
retraining, and the method of tasks 
(10, 16, 19), we could “follow the 
base rates’’ by treating every pa- 
tient with the method known to 
have the highest success frequency 
among patients “‘similar’’ to him. 
The criteria of similarity (class 
membership) will presumably be 
multiple, both phenotypic and geno- 
typic, and will have been chosen be- 
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cause of their empirically demon- 
strated prognostic relevance rather 
than by guesswork, as is current 
practice. Such an idealized situation 
also presupposes that the selection 
and training of psychotherapists will 
have become socially realistic so that 
therapeutic personnel skilled in the 
various methods will be available in 
some reasonable proportion to the 
incidence with which each method is 
the treatment of choice. 

How close are we to the upper 
limit of the predictive validity of 
personality tests, such as was reached 
remarkably early in the development 
of academic aptitude tests? If the 
now-familiar % to } proportions of 
hits against even-split criterion di- 
chotomies are already approaching 
that upper limit, we may well dis- 
cover that for many decision prob- 
lems the search for tests that will 
significantly better the base rates is a 
rather unrewarding enterprise. When 
the criterion is a more circumscribed 
trait or symptom (‘‘depressed,’’ ‘‘af- 
filiative,’’ “sadistic,” and the like), 
the difficulty of improving upon the 
base rates is combined with the 
doubtfulness about how valuable it is 
to have such information with 75% 
confidence anyhow. But this involves 
larger issues beyond the scope of the 
present paper. 


AVAILABILITY OF INFORMATION 
ON BASE RATES 


The obvious difficulty we face in 
practical utilization of the preceding 
formulas arises from the fact that 
actual quantitative knowledge of the 
base rates is usually lacking. But this 
difficulty must not lead to a dismissal 
of our considerations as clinically ir- 
relevant. In the case of many clini- 
cal decisions, chiefly those involving 
such phenotypic criteria as overt 
symptoms, formal diagnosis, sub- 
sequent hospitalization, persistence in 
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therapy, vocational or marital ad- 
justment, and the numerous “‘sur- 
face’”’ personality traits which clini- 
cians try to assess, the chief reason 
for our ignorance of the base rates is 
nothing more subtle than our failure 
to compute them. The file data avail- 
able in most installations having a 
fairly stable source of clientele would 
yield values sufficiently accurate to 
permit minimum and maximum esti- 
mates which might be sufficient to 
decide for or against use of a pro- 
posed sign. It is our opinion that this 
rather mundane taxonomic task is of 
much greater importance than has 
been realized, and we hope that the 
present paper will impel workers to 
more systematic efforts along these 
lines. 

Even in the case of more subtle, 
complex, and genotypic inferences, 
the situation is far from hopeless. 
Take the case of some such dynamic 
attribution as “strong latent de- 
will be anxiety- 
arousing as therapy proceeds.” If 
this is so difficult to discern even dur- 
ing intensive therapy that a thera- 
pist’s rating on it has too little reli- 
ability for use as a criterion, it is hard 
to see just what is the value of guess- 
ing it from psychometrics. If a 
skilled therapist cannot discriminate 
the personality characteristic after 
considerable contact with the patient, 
it is at least debatable whether the 
characteristic makes any practical 
difference. On the other hand, if it 
can be reliably judged by therapists, 
the determination of approximate 
base rates again involves nothing 
more complex than systematic re- 
cording of these judgments and sub- 
sequent tabulation. Finally, ‘‘clini- 
cal experience”’ and ‘‘common sense”’ 
must be invoked when there is noth- 
ing better to be had. Surely if the 
gi/gz2 ratio for a test sign claiming 
validity for ‘‘difficulty in accepting 


pendency, which 
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inner drives’ shows from the formula 
that the base rate must not exceed 
.65 to justify use of the sign, we can 
be fairly confident in discarding it for 
use with any psychiatric population! 
Such a “backward” use of the for- 
mula to obtain a maximum useful 
value of P, in conjunction with the 
most tolerant common-sense esti- 
mates of P from daily experience, 
will often suffice to answer the ques- 
tion. If one is really in complete 
ignorance of the limits within which 
P lies, then obviously no rational 
judgment as to the probable efficiency 
of the sign can be made. 


ESTIMATION VERSUS 
SIGNIFICANCE 


A further implication of the fore- 
going thinking is that the exactness 
of certain small sample statistics, 
or the relative freedom of certain 
nonparametric methods from dis- 
tribution assumptions, has to be 
stated with care lest it mislead clini- 
cians into an unjustified confidence. 
When an investigator concludes that 
a sign, item, cutting score, or pattern 
has “‘validity’”’ on the basis of small 
sample methods, he has rendered a 
certain very broad null hypothesis 
unplausible. To decide, however, 
whether this ‘validity’ warrants 
clinicians in using the test is (as every 
statistician would insist) a further 
and more complex question. To 
answer this question, we require more 
than knowledge that pi¥f. We 
need in addition to know, with re- 
spect to each decision for which the 
sign is being proposed, whether the 
appropriate inequality involving pi, 
po, and P is fulfilled. More than this, 
since we will usually be extrapolating 
to a somewhat different clinical 
population, we need to know whether 
altered base rates P’ and Q’ will 
falsify these inequalities. To do this 
demands estimates of the test parame- 
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ters p; and 2, the setting up of con- 
fidence belts for their difference 
pfi-— fp: rather than the mere proof of 
their nonidentity. Finally, if the sign 
is a cutting score, we will want to 
consider shifting it so as to maintain 
optimal hit frequency with new base 
rates. The effect upon p; and p, ofa 
contemplated movement of a critical 
score or band requires a knowledge 
of distribution form such as only a 
large sample can give. 

As is true in all practical applica- 
tions of statistical inference, non- 
mathematical considerations enter 
into the use of the numerical patterns 
that exist among P, pi, fo, and R. 
But “‘pragmatic’’ judgments initially 
require a separation of the several 
probabilities involved, some of which 
may be much more important than 
others in terms of the human values 
associated with them. In some set- 
tings, over-all hit rate is all that we 
careabout. In others, a redistribution 
of the hits and misses even without 
much total improvement may con- 
cern us. In still others, the propor- 
tions p; and q are of primary interest; 
and, finally, in some instances the 
confrontation of a certain increment 
in the absolute frequency (NPp,) of 
one group identified will outweigh 
all other considerations. 

Lest our conclusions seem unduly 
pessimistic, what constructive sug- 
gestions can we offer? We have 
already mentioned the following: (a) 
Searching for subpopulations with 
different base rates; (b) successive- 
hurdles testing; (c) the fact that even 
a very small percentage of improve- 
ment may be worth achieving in 
certain crucial decisions; (d) the need 
for systematic collection of base- 
rate data so that our several equa- 
tions can be applied. To these we 
may add two further ‘constructive’ 
comments. First, test research at- 
tention should be largely concen- 
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trated upon behaviors having base 
rates nearer a 50~50 split, since it is 
for these that it is easiest to improve 
on a base-rate decision policy by use 
of a test having moderate validity. 
There are, after all, a large number 
of clinically important traits which 
do not occur “almost always’ or 
“very rarely.”” Test research might 
be slanted more toward them; the 
current popularity of Q-sort ap- 
proaches should facilitate the growth 
of such an emphasis, by directing 
attention to items having a reason- 
able “spread” in the clinical popula- 
tion. Exceptions to such a research 
policy will arise, in those rare do- 
mains where the pragmatic conse- 
quences of the alternative decisions 
justify focusing attention almost 
wholly on maximizing Pp, with 
relative neglect of Qpe. Secondly, 
we think the injunction “quit wast- 
ing time on noncontributory psy- 
chometrics’”’ is really constructive. 
When the clinical psychologist sees 
the near futility of predicting rare or 
near-universal events and traits from 
test validities incapable of improving 
upon the base rates, his clinical time 
is freed for more economically de- 
fensible activities, such as research 
which will improve the parameters 
pi and 2; and for treating patients 
rather than uttering low-confidence 
prophecies or truisms about them (in 
this connection see 12, pp. vii, 7, 
127-128). It has not been our inten- 
tion to be dogmatic about ‘“‘what is 
worth finding out, how often.”” We 
do suggest that the clinical use of 
patterns, cutting scores, and signs, 
or research efforts devoted to the 
discovery of such, should always be 
evaluated in the light of the simple 
algebraic fact discovered in 1763 by 
Mr. Bayes. 


SUMMARY 
1. The practical value of a psy- 
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chometric sign, pattern, or cutting 
score depends jointly upon its in- 
trinsic validity (in the usual sense of 
its discriminating power) and the 
distribution of the criterion variable 
(base rates) in the clinical population. 
Almost all contemporary research 
reporting neglects the base-rate fac- 
tor and hence makes evaluation of 
test usefulness difficult or impossible. 

2. In some circumstances, notably 
when the base rates of the criterion 
classification deviate greatly from a 
50 per cent split, use of a test sign 
having slight or moderate validity 
will result in an increase of erroneous 
clinical decisions. 

3. Even if the test’s parameters are 
precisely known, so that ordinary 
cross-validation shrinkage is not a 
problem, application of a sign within 
a population having these same test 
parameters but a different base rate 
may result in a marked change in 
the proportion of correct decisions. 
For this reason validation studies 
should present trustworthy informa- 
tion respecting the criterion distribu- 
tion in addition to such test param- 
eters as false positive and false nega- 
tive rates. 

4. Establishment of ‘‘validity”’ by 
exact small sample statistics, since 
it does not yield accurate information 
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about the test parameters (a problem 
of estimation rather than signifi- 
cance), does not permit trustworthy 
judgments as to test usefulness in a 
new population with different or 
unknown base rates. 

5. Formulas are presented for de- 
termining limits upon _ relations 
among (a) the base rates, (6) false 
negative rate, and (c) false positive 
rate which must obtain if use of the 
test sign is to improve clinical de- 
cision making. 

6. If, however, external constraints 
(e.g., available staff time) render it 
administratively unfeasible to decide 
all cases in accordance with the base 
rates, a test sign may be worth ap- 
plying even if following the base 
rates would maximize the total cor- 
rect decisions, were such a policy 
possible. 

7. Trustworthy information as to 
the base rates of various patient char- 
acteristics can readily be obtained by 
file research, and test development 
should (other things being equal) be 
concentrated on those characteristics 
having base rates nearer .50 rather 
than close to .00 or 1.00. 

8. The basic rationale is that of 
Bayes’ Theorem concerning the cal- 
culation of so-called “inverse prob- 
ability.” 
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THE PROBLEM 


This review of the literature is a 
foundation stone for a research pro- 
gram on intra-individual variability, 
the variability of an individual's be- 
havior from one time to another. 
The program’s objectives include the 
exploration of the phenomena of 
variability within an individual’s be- 
havior, with consequent implications 
for theory and practice in psychomet- 
rics and in personality. The pro- 
gram seeks light on several questions. 
Can we partial out from the conven- 
tional error variance of psychomet- 
rics a component of variance over 
time which is associated with the 


individual? (Such a component would 
probably make different proportional 


contributions to the variance of 
scores for different individuals.) Are 
there variability factors analogous 
to the well-known factors of level 
scores in mental abilities, interests, 
and personality? How can the con- 
cept of intra-individual variability 
contribute to problems of the predic- 
tion of behavior? What is the signifi- 
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cance of this concept for the study of 
personality and for personality 
theory? Is variability within a class 
of behavior associated with degree of 
integration within an area of person- 
ality? Are the roots of variability in 
the neural or physiological function- 
ing of the organism? 

The problem of intra-individual 
variability has not been subjected 
to systematic conceptualization. 
Hence it is difficult to set up definite 
criteria for inclusion in this review 
and to organize the numerous but 
highly disparate studies which seem 
pertinent. 

Asa means of structuring the prob- 
lem, we shall first delineate a model 
instance. Pure intra-individual vari- 
ability is defined as the difference be- 
tween the two responses of an individual 
at two points in time under the follow- 
ing conditions: (a) the individual is 
exposed each time to the same stimulus 
or to objectively indistinguishable stim- 
uli; (b) the total situation in which the 
responses are made 1s the same on both 
occasions. It is doubtful whether 
such an abstract case ever exists. 
Guthrie (89) argues that the exact 
situation is never reproduced. Several 
problems await answers. What de- 
gree of homogeneity between stimuli 
and between situations is required 
before we can assume that psycho- 
logical equivalence exists? In judg- 
ing equivalence, what are the bounda- 
ries (spatial, temporal, and psycho- 
logical) of a ‘‘total”’ situation? 

In this formulation, the meanings 
of two words need clarification. We 
are using ‘‘situation’’ to refer to the 
total immediate environment of the 
organism. Strictly speaking, it should 
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include the stimulus on which we 
focus our attention. The situation 
can be defined as embracing all fac- 
tors external to the organism, which 
affect responses. 

At this point, ‘response’ is not 
given a formal definition. When we 
examine the difference between two 
responses, we may restrict ourselves 
to any one of several attributes, such 
as magnitude, intensity, latency, or 
quality. 

This paradigm involves the further 
assumption that the order of the two 
responses 1s immaterial. This require- 
ment implies that the responses show 
no systematic trend over time, due to 
such processes as learning, fatigue, 
etc., i.e., that the response is not a 
function of time. This assumption 
also implies that the second response 
is not affected by either the first re- 
sponse or the first presentation of the 
stimulus. We may call this pure case 
spontaneous variability, as opposed 
to reactive or adaptive variability. 

Finally, there is an assumption 
which underlies this entire research 
program: intra-individual response 
variability is not random; it is a law- 
ful phenomenon. The variability of 
one individual’s responses to one 
stimulus is determined by more or 
less enduring factors within the indi- 
vidual. Two postulates can be de- 
rived from this assumption. (a) Given 
the same stimulus in the same situa- 
tion, the difference between an indi- 
vidual’s responses at two points in 
time is related to the difference be- 
tween his two responses (to that 
stimulus) at two other points in time. 
(b) Furthermore, the difference be- 
tween his responses to one stimulus is 
related to the difference between re- 
sponses to at least one other stimu- 
lus, objectively distinguishable from 
the first one. Presumably, the magni- 
tude of this latter relationship is a 
function of the similarity of the 
stimuli. This general fundamental 
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assumption is obviously testable. On 
the basis of our preliminary studies 
and of research cited below, we con- 
sider that this assumption has been 
verified. 

Data that exactly fit this paradigm 
are rare. Usually we shall be con- 
cerned with variability summed over 
several stimuli (e.g., number of test 
responses changed on retest) or with 
variability in a composite score based 
on the sum of several responses (e.g., 
change in total score on an instru- 
ment measuring intelligence, per- 
formance, interest, etc.). 

In the rest of this paper, we shall 
be primarily concerned with the pure 
case or approximations to it. Such 
instances will be called Type J or 
spontaneous variability. 

Second in importance to us is 
Type II, the case where all condi- 
tions and assumptions for the pure 
case, Type I, are met with the excep- 
tion that the sequence of responses 
shows some pattern or order, other 
than a monotonic function of time. 
The simplest example is the alterna- 
tion of responses. In this class fall 
instances where the second response 
is affected by the first response or 
the first presentation of the stimulus 
(as in alternation) and also instances 
where the differences between suc- 
cessive responses are less marked— 
e.g., where cycles or oscillations are 
present. We shall not attempt a 
comprehensive coverage of the litera- 
ture on Type II variability, especially 
of the latter kind. 

Most instances of Type II variabil- 
ity can be classed as reactive variabil- 
ity: the change in response is deter- 
mined in part by the organism’s 
reaction to the stimulus it has recent- 
ly reacted to and/or its reaction to 
its preceding response. 

We do not know yet whether Type 
I and Type II are actually different 
in practice. Nevertheless, the con- 
cepts underlying these classes are 











quite distinct: Type II variability is 
composed of variability from the 
same sources as Type I, but includes 
in addition reactive or periodic vari- 
ability. Thus all the determinants of 
Type I variability are present in 
Type II, but Type II has one extra 
and major determinant associated 
with temporal order. For the pres- 
ent, we are making the distinction 
on an empirical basis: Is there evi- 
dence that the responses show some 
simple ordering? It is possible that 
the assumption of no systematic trend 
over time which we make for Type 
I is unjustified. Cases exist, however, 
where the order of the responses can 
be treatedas having a negligible effect. 

A third class of problems, Type J1J, 
differs from the pure Type I case in 
that objectively different stimuli 
are presented on the two occasions 
or the background situation is 
changed. Here, the focus is usually 
on the appropriateness of the change 
in response. Is the difference between 
the subject’s two responses too small 

does he fail to ‘‘adapt’’ to the 
change of the stimulus (as in some 
studies of rigidity)? Is the difference 
too large—for example, does he over- 
react to stress? 

It is obvious that the Type III case 
can be taken to include any compari 
son between responses in two situa- 
tions which is not included in either 
Type I or Type II. Hence, our dis- 
cussion of Type III variability will 
mention only studies which may help 
us to understand Type | phenome- 
na. We shal! not consider ‘‘scatter"’ 
or profile variability on different 
aptitudes. In general, in this paper 
we are attempting comprehensive 
coverage of only Type | variability. 
Our explorations into related topics 
are solely to clarify our central prob- 
lem and to obtain leads for attacking 
it. 

As far as we know, there have 
been no recent reviews of the litera 


INTRA-INDIVIDUAL RESPONSE VARIABILITY 219 


ture on Type I| variability. Some of 
the earlier studies are examined by 
Allport and Vernon (3, pp. 124-128). 
Solomon (201) discusses research on 
many topics related to variability, 
especially work on the avoidance of 
repetition of response (Type II 
variability). Glanzer (77) examines 
much of the literature on alternation. 
An extensive list of papers on be- 
havior in guessing is provided by 
Senders and Sowards (194). Other 
surveys of the literature on related 
topics are mentioned in subsequent 
sections of this paper. 

The sections on these three types of 
variability are followed by a survey 
of studies of the correlates of variabil- 
ity. Here are data from which may 
be gleaned some preliminary notions 
about the origin and structure of 
variability. 

In this paper, unless otherwise 
qualified, the term “variability” al- 
ways refers to intra-individual re- 
sponse variability. Any references 
to conventional interindividual or 
group variability (i.e., “individual 
differences’) will be explicitly indi- 
cated. 


VARIABILITY OF ORGANIC 
PROCESSES 


It is obvious that spontaneous 
variability (Type I) is produced by 
factors within the organism. In the 
pure case, the stimulus and the exter- 
nal situation are unchanged, leaving 
only the organism as the locus for 
determinants of variability. Presum- 
ably there must be some variation 
or change in organic processes under- 
lying response variability. Each of 
these processes may itself be variable 
in its functioning, or it may be regu- 
lar over time—we need make no 
assumption as to the character of this 
organic variation. However, it would 
be poor strategy to embark on a study 
of response variability without evi- 
dence that there exists some variabil- 
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ity in the functioning of organic 
processes, independent of direct ef- 
fects of changes in the external en- 
vironment. This section will cite 
work which supports such a belief. 

Loeb (140, p. 622) points out that 
interindividual differences and the 
nonpredictability of individual re- 
sponses increase as the structure of 
the organism, especially the nervous 
system, becomes more complex. Por- 
traying the organism as an active 
system, Bertalanffy argues that 
movements (or responses) are influ- 
enced by “spontaneous fluctuations 
in the excitations of nerve centers’’ 
(10, p. 167); therefore, from his 
point of view, reactions are deter- 
mined by the changing internal situ- 
ation within the organism, not direct- 
ly by the stimulus outside. Rashev- 
sky elaborates on these spontaneous 
fluctuations: ‘‘...in general we 
must expect spontaneous fluctuations 
of excitation to occur in the central 
nervous system. Such fluctuations 
may be due to fluctuations of meta- 
bolic activity or to excitation carried 
to a given region from a number of 
other regions of the brain, which are 
randomly excited by the stream of 
incoming exteroceptive as well as 
proprioceptive and _ enteroceptive 
stimuli” (177, p. 166). (Cf. 130.) 
Chocholle (32) discusses causes of 
fluctuation in auditory reaction time 
which are of central origin, such as 
the state of the sensory areas. 

The spontaneous activity of the 
nervous system presumably lies be- 
hind the extraordinary phenomena 
reported by Heron, Bexton, and Hebb 
(105). Subjects kept for a day or 
more in a condition involving a 
marked reduction in sensory stimula- 
tion reported ‘‘visual imagery, dream- 
like in vividness,” akin to hallucina- 
tion. 

Differences in intra-individual vari- 
ability in biochemistry are empha- 
sized by Jellinek (123). Persky (170) 
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has compared the average intra-indi- 
vidual variabilities for three bio- 
chemical stress indices. A relation- 
ship between emotional instability 
and variability in blood constituents 
has been reported by Hammett 
(100) and by Goldstein (82). 

In analyzing variation, Crozier 
and Hoagland (42) distinguish be- 
tween errors of observation and re- 
cording and the real variation found 
in the responses of living organisms. 
The latter can be considered to be 
“‘unpredictableness’’ due to the ex- 
treme complexity of organic systems. 
They cite experimental evidence that 
variability is reduced by an increase 
in excitation and motor output. We 
may entertain the tentative hypothe- 
sis that, up to some point, as the 
demands of a stimulus situation in- 
crease the mobilization of the organ- 
ism, the variability of response de- 
creases. 

Empirical data on neural and 
physiological variability are avail- 
able. Blair and Erlanger (15) found 
that neural response thresholds and 
reaction latencies of individual axon 
fibers vary spontaneously from in- 
stant to instant. Herrington (106) 
measured basal metabolic rate, sys- 
tolic blood pressure, respiration rate, 
pulse rate, and rated activity 45 
times over 90 days for 11 subjects. 
With the exception of intercorrela- 
tions involving BMR, the median 
intercorrelation of the standard de- 
viations was as high as the median 
intercorrelation between means. Fur- 
thermore, the ratings on average 
activity level were related to both 
the mean levels and the variability 
of the three physiological measures 
(again excluding BMR). From an 
extensive study of basal metabolic 
rates, Harmon (101) showed that 
BMR measurements taken under 
standard conditions indicate con- 
siderable variation from day to day. 

In his studies of biological intelli- 
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gence, Halstead (95) has noted that 
the average deviations of normal 
subjects on critical fusion frequency 
show cyclical variation over time. 
Schmidtke (190) observed that per- 
formance on tests of cff varied with 
time of day, the amount of vacilla- 
tion showing individual differences. 
McNemar studied variability in criti- 
cal fusion frequency on different 
days and under different conditions. 
She concluded that ‘individuals do 
not exhibit the same degree of varia- 
bility in response errors under all 
conditions of measurement”’ (144, p. 
21). However, many of the inter- 
correlations between variability 
measures from different test condi- 
tions on the same day were significant 
at the .05 level. Variability for the 
same condition on different days was 
found to be relatively unstable. This 
study illustrates a major problem in 
measuring variability. Were the six 
observations per trial sufficient to 
yield a stable value? If not, would 
it be possible to increase the number 
of observations and still eliminate 
any effects of fatigue and lowered 
motivation? 

Fluctuations of minimal sensory 
stimulations at threshold have been 
studied for many years. Early work 
in this area is summarized by Guil- 
ford (87), who interpreted them as 
fluctuations of attention, but Bills 
(13) objected to this interpretation 
because the periodicities were differ- 
ent for different sense organs. (Cf. 
46, discussed below.) 

In this review, we have not at- 
tempted systematic coverage of the 
literature on fluctuations of atten- 
tion, partly because much of it ig- 
nores both inter- and intra-individual 
differences. For example, Butorin 
(22) studied variation in speed of 
addition as a measure of stability 
of attention. 

Brunswik argues that the organism 
must be flexible because it has such 


221 


incomplete knowledge of its environ- 
ment: ‘Ambiguity of cues and means 
relative to the vitally relevant objects 
and results must find its counterpart 
in an ambiguity and flexibility of the 
proximal-peripheral mediating proc- 
esses in the organism” (21, pp. 257- 
258). The theme that the capacity 
to vary responses is essential to indi- 
vidual development and survival is 
implicit or explicit in many publica- 
tions. Not only must the individual 
respond differently to different situa- 
ations but he must also vary his 
response to the same situation in 
order to adapt, i.e., to improve his 
adjustment to the situation. 


PSYCHOMETRIC ASPECTS OF 
VARIABILITY 


Before proceeding with the main 
body of this paper, the review of stud- 
ies of individual variability in re- 
sponse, we should consider briefly the 
various possible measures of varia- 
bility and some problems associated 
with this measurement. This discus- 
sion is necessary because the topic 
has not been systematically studied 
before. 


Approaches to the Measurement of 
Variability 

The phenomena of variability are 
usually viewed negatively. Efforts 
are usually made to minimize the 
extent of intra-individual variability. 
Thus one ordinarily seeks high test- 
retest reliabilitv (i.e., low variabil- 
ity) and eliminates stimulus items to 
which inconsistent (varying) responses 
are made. Our problem can be viewed 
as the measurement of the unreliabil- 
ity of the individual. Guttman 
(91) points out that there are three 
sources of psychometric variation: 
persons, items, and trials. He formu- 
lates an expression for the variance of 
an individual on an item. In his equa- 
tion for reliability he includes an 
error variance term which is the 
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mean of the error variances for indi- 
viduals. Thus he does not assume 
that individuals are equally unreli- 
able, but rather allows for individual 
differences in unreliability. Guttman 
(92, 93) has also developed formulas 
for the reliability of qualitative 
(categorical) data. 

In discussing reliability, Coombs 
(37) also considers the individual 
reliability of the same item over time. 
This formulation is developed in a 
more generalized form in his Theory 
of Psychological Scaling (38). An 
earlier paper (35) considers the prob- 
lem of obtaining a dispersion score 
for an individual. Both Coombs and 
Guttman, however, give formal recog- 
nition to the concept without solving 
the practical problems involved in 
its measurement. 

In psychometric theory, the as- 
sumption is sometimes made that 
errors of measurement for two tests 
or for the same test on two occasions 
are uncorrelated. This assumption 
has been questioned by _ several 
writers. Thouless (214) discusses 
and demonstrates the fluctuation of 
a mental function over time. The 
variation over time of a _ person’s 
scores around his true score is noted 
by Ferguson (68) and by Brown and 
Thomson (18). 

Cattell (24, p. 105) holds that 
dynamic traits fluctuate even more 
widely from day to day. (Cf. also 
25 and 29.) 

An intensive study of components 
of test unreliability has been made 
by Fagin (65). He demonstrates 
that this unreliability is composed of 
random error variance plus quotidian 
variance (cf. 228, below) or consistent 
personal variation. For trait items, 
more quotidian variance was found in 
more reliable items, but this rela- 
tionship was not present in interest 
items. He concludes that the pro- 
portion of quotidian variance can 
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be estimated and that quotidian 
scores for people are reliable. 

A series of papers by Glaser (78, 
79, 80, 81) analyzes changes in the 
responses on retest. Defining incon- 
sistency as the number of responses 
changed on retest, he finds that the 
three intercorrelations of inconsisten- 
cy scores on three tests (intelligence, 
interests, personality) are uniformly 
low although two are statistically 
significant (78). He indicates that 
the number of changes between one 
pair of two trials (out of three) is 
related to the number between an- 
other pair and that the mean difficul- 
ty level of items with inconsistent 
(changed) responses correlates highly 
with performance (80, 81). (Cf. 
Yoshioka’s finding [231] that rats 
varied their choice more when the 
discrimination was more difficult. In 
a situation where two paths of differ- 
ent length were available, the pro- 
portion of choices of the shorter 
[‘‘correct’’| path was a function of the 
ratio of their lengths, not their abso- 
lute lengths.) Glaser shows further 
that inconsistency scores have no 
relationship to level scores for a test 
with an effective range appropriate 
to the group tested. However, if a 
test is too easy or too difficult for a 
group and yields a markedly skewed 
distribution of scores, a relationship 
will be found between level score and 
consistency score. He also reports 
results consistent with Mosier’s find- 
ing (157) of high split-half reliability 
for the difficulty value of the median 
error. (See also 229, discussed below.) 

Cox (39) obtained substantial nega- 
tive correlations between variability 
from day to day and initial ability 
on a motor task. The relation be- 
tween variability and improvement 
varied from task to task. 

The possibility of a relationship 
between a variability score and the 
level score on the same_ responses 
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must always be kept in mind. Where 
the level score contributes to or is a 
determinant of the variability score, 
its effects should usually be partialed 
out. However, it is conceivable that 
under some conditions, the variability 
itself may affect the level score. 

McReynolds (145) developed a 
measure of consistency which in- 
volved level of difficulty. His sub- 
jects were asked whether they could 
‘see’ a given concept in an indicated 
area on a Rorschach card. Ordering 
the concepts in terms of difficulty, 
he developed a score based on the 
relative disorder of the subject’s re- 
plies, i.e., deviations from saying 
“ves” to all concepts below a par- 
ticular level and ‘‘no”’ to all above. 
This score approximates Coombs’ 
dispersion score of items for one indi- 
vidual (35). 

Given a series of repeated measure- 
ments for the same individual, sever- 
al measures of their variability are 


available and have been used: the 
standard deviation (42, 136); the 
average deviation and the range 
(166). Measures of profile similarity 


can frequently be used as measures 
of variability: e.g., D, based on 
squared differences between paired 
responses, as discussed by Cronbach 
and Gleser (41). Such a measure 
may, however, be related to the 
means and sigmas of the response 
distributions. Correlation coeffi- 
cients may also be appropriate meas- 
ures in some instances. 

Noting that variation from day to 
day occurs even when external condi- 
tions are well controlled, Woodrow 
(228) has suggested the concept of 
‘quotidian variability,’’ which may 
be used to describe individuals and 
also to check on the stability of inter- 
nal conditions during an experiment. 
He recommends a ratio where the 
numerator is the standard deviation 
of the daily means and the denomina- 
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tor is the average of the standard 
deviations for each day divided by 
the square root of the number of 
trials per day. This formula measures 
variation from one day to the next 
in terms of variation within one 
occasion. However, we must not 
overlook the individual differences in 
intra-occasion variability. 

Some investigators (64, 72, 116) 
have utilized measures that take in- 
to account the relative position of the 
responses. 

Another method of studying se- 
quences of responses is spectral analy- 
sis, which yields a profile of the rela- 
tive contributions to the total vari- 
ance (or oscillations of performance) 
of each of the possible component 
waves. Abelson (1) demonstrated 
its use on a perceptual-motor task 
and compared it to a conventional 
measure, the variance. The correla- 
tion was .08, indicating the essential 
independence of these two measures of 
variability derived from the same 
set of responses. 

On the other hand, De Valois (50) 
found substantial relationships (of 
the order of .70) among three meas- 
ures, even when each measure was 
based on a different set of responses. 
Using a five-unit maze, his measures 
were number of different paths used, 
number of specific choices changed 
from the choice on last trial, and num- 
ber of times a third, shorter alterna- 
tive was used when it was opened 
up. 

The intercorrelation of constancy 
scores (unchanged responses on re- 
test) has been shown by Dunlap (58) 
to be in the order of .40 for the several 
sections of his Preference Blank. The 
constancy indices had small positive 
relationships with the level scores for 
corresponding areas and even less 
relationship with intelligence. 

Many general problems of psycho- 
logical measurement must be recon- 
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sidered in developing measures of 
variability. For example, there is the 
assumption of additivity of responses 
which is generally accepted for level 
scores, in spite of an occasional 
strong protest. Since we obviously 
wish to avoid studying the vari- 
ability of each separate item, we can 
develop a total change score by 
counting the number of changed re- 
sponses (cf. 78, 81, 104, 138). This 
measure is similar to Zubin’s meas- 
ure of like-mindedness (233). 

Such a sum-of-changes score must 
be distinguished from a score based 
on change-in-total or level. Thus we 
can count the number of answers 
changed on a multiple-choice test 
or we can compute the change in total 
score from one trial to the rest (cf. 
136 and 206). But a change-in-total 
score is an index to change in some 
posited trait, not the tendency to 
change specific responses. 

Another distinction, which was 
not made in the last section, should 
be made explicit. Our basic model 
deals with change in response when 
the same stimulus is repeated. But 
there is also the concept of discrepan- 
cy between response to homogeneous 
stimuli, i.e., to stimuli so objectively 
similar that the same response is 
expected (cf. Type III variability). 
Applicable concepts here are Coombs’ 
homogeneity (37) and _ dispersion 
score over items (35, 36). Any at- 
tempt to measure variability through 
discrepancies in responses to homo- 
geneous items must be based on the 
assumption that the degree of homo- 
geneity of two items is constant for 
all individuals. Otherwise, the dis- 
crepancy could be a consequence 
not of individual variability, but of 
the heterogeneity of the items for 
that individual. Thus the study by 
McReynolds (145), mentioned above, 
assumes that the concepts had the 
same order of difficulty or plausibility 
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for all subjects. Stated in another 
way, McReynolds did not directly 
observe intra-individual variability 
or consistency but rather the extent 
to which each subject’s order of 
difficulty (as found on one occasion) 
corresponded with the order of aver- 
age difhiculty for a group. 


The Consistency of Variability (Ke- 
lationships with Time) 


Variability involves the difference 
between two responses. A measure 
of variability is usually computed 
from several such differences between 
paired responses or from the several 
deviations from the central tendency 
of a series of responses. The several 
responses may be made on one oc- 
casion, on two occasions, or on several 
occasions. 

Are measures of variability con- 
sistent? Does variability within one 
occasion show internal consistency? 
Is variability on one occasion related 
to variability on another? Are vari- 
ability measures based on compari- 
sons of responses on two or more 
occasions internally consistent? Do 
variability measures show systematic 
trends over time? While definitive 
answers to all these questions can- 
not be given at this time, it is ap- 
propriate to examine the available 
evidence. 

Some work on variability suggests 
that variability has fair consistency 
over time, i.e., variability measures 
from different occasions are related 
to each other. Thorndike (213) re- 
ported continued variability in the 
repeated spellings of the same sound 
in nonsense syllables. In his work on 
covariation of efficiency of perform- 
ances on several tasks, Asch (7) noted 
that an individual’s variability re- 
mained constant as learning pro- 
gressed. To increase the stability 
of his measures of variability, Fryer 
(73) discarded the first and the last 
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of ten trials per day. In our analyses 
of his data, the individual standard 
deviations for the second, third, and 
fourth days showed appreciable inter- 
correlations (.48 to .82) with each 
other but not with the standard 
deviations for the first day. Fliigel’s 
measure of oscillation (71) was highly 
consistent from the first half to the 
second half of 46 daily sessions. (Cf. 
Lovell [142] discussed below, and 
De Valois [50] who used an approach 
analogous to Cronbach's coefficient 
of stability and equivalence [40, pp. 
69—70].) 

Scores for fluctuation in attitudes 
and sentiments between two ses- 
sions a day apart were related to 
scores for fluctuations between ses- 
sions a month apart (Cattell, 23): 
the correlation was .47 for children 
and .77 for adults. Preliminary 
studies by the authors suggest that 
such fluctuation scores are correlated 
with the tendency to make extreme 
ratings, but the relationship is not 
an artifact: unlike those giving many 
extreme ratings, individuals placing 
most of their ratings in the center 
do not change their occasional ex- 
treme responses any more than they 
change their more moderate, center 
responses. 

Cummings (43) measured varia- 
bility within three-minute periods. 
For three diverse tests, such scores 
were highly consistent over ten peri- 
ods. When the ten periods were on 
the same day, the average scores for 
the three tests had only low intercor- 
relations. However, when the ten 
periods were on ten different days, 
the intercorrelations between the 
average variability scores (for each 
of the three tests) were sufficiently 
high to suggest a common variability 
factor. 

The reliability of a variability score 
may also be a function of the nature 
of the responses studied. Allport 
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and Vernon (3) reviewed earlier 
literature and concluded that vari- 
ability measures derived from “raw 
physical scores,’’ such as tapping, 
are reliable but highly specific, while 
lower reliability is found for variabil- 
ity measures derived from “more 
complex tests of intelligence or per- 
sonality whose initial reliability is not 
high” (3, p. 128). Also, according 
to them, Reymert (178) found ‘‘that 
the measure of variability in reaction 
time failed to correlate with the varia- 
bilities of more integrated activities 
such as reading and counting”’ (3, p. 
132). 

A fundamental consideration is the 
time period between responses. 
Dudek (56) considers that the vari- 
ability of individuals often contrib- 
utes to inconsistency in test scores. 
He finds that the Spearman-Brown 
prophecy formula holds for the split- 
half reliability of a test on a single 
occasion but may not hold for test- 
retest reliability. Paulsen (168, 
169) also found that split-half reli- 
abilities did not predict test-retest 
values and reported that intertrial 
correlations decreased as the number 
of intervening trials increased. (The 
latter tendency was also noted by 
Hertzman [107] and by McNemar 
{144].) The studies by Dudek and 
Paulsen used perfectly homogeneous 
tests of steadiness—the ‘“‘stimuli”’ 
remained the same throughout. Thus, 
variability over time is related to, 
but is not identical with, inconsisten- 
cy of performance on a single occa- 
sion. 

Variability over time may occur 
under either of two conditions, de- 
fined in terms of rate of organic 
fluctuation. (a) While the organism 
may respond consistently within 
any short test session, it may func- 
tion differently on different occasions 
clearly separated in time. (b) On 
the other hand, the organic processes 
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determining the response to a given 
stimulus may be in a continual state 
of flux, such that two very different 
responses are made just a few sec- 
onds apart. If this condition exists, 
then the responses within a given 
occasion are a random sample of 
the individual's responses to the 
stimulus and variation over time can 
be predicted from variation within 
an occasion. If the first condition 
holds, variation within an occasion 
may be unrelated to variation over 
time. 

On the other hand, many studies 
show that variability itself may be a 
function of the number of trials: 
variability may increase or decrease 
as the series of responses continues. 
Thorndike (210) reports a slight re- 
duction in variability in drawing 
hundreds of lines of specified length. 
With reward or punishment, the 
average error (variability) in a dis- 
crimination task was shown by 
Hamilton (97) to decrease as the 
number of trials increased. Lashley 
(Crozier and Hoagland, 42) found 
that in archery practice, the stand- 
ard deviation decreased but the rela- 
tive variability was constant. For 
addition problems, Fliigel’s subjects 
showed increased absolute oscillation 
but decreased percentage oscillation 
over 46 daily sessions (71). Sarvis 
(188) noted that rhythms in speed 
of tracing mazes tended to disappear 
over time. In Hall's study (94), 
rats showed less variability on later 
trials in a five-alternative maze. 
Vacillation decreased with training 
in the experiment of Mowrer and 
Jones (159). Variability during condi- 
tioning, extinction, and recondition- 
ing was studied by Antonitis (5). He 
found that variability of response 
decreased as a function of the number 
of reinforcements during condition- 
ing, increased rapidly during extinc- 
tion trials, and decreased during 
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reconditioning below the level for 
conditioning trials. Greater variabil- 
ity in output on a continuing task is 
associated with increased fatigue, 
according to Bills. This represents a 
‘breakdown in the controlling set”’ 
(14, p. 54). 

Repeated self-descriptive Q sorts 
have been analyzed by D. M. Taylor 
(206). In a series of sorts, later ones 
showed higher intercorrelations than 
earlier ones. The consistency of self 
concept over time correlated .33 
with the relative positiveness of the 
initial self-description. Sorts for 
self were less consistent than sorts 
for ideals. (Cf. 117, discussed below.) 

The Studies in Expressive Move- 
ment, by Allport and Vernon (3), 
provide some highly provocative 
data. They tested 25 subjects on a 
wide variety of tests in three different 
sessions. Although they noted that 
the reliability of level scores was 
lower between sessions than within 
sessions, they utilized individual vari- 
ability scores based on measures from 
one, two, or three sessions, thus 
confounding intrasession and inter- 
session variability. Furthermore, 
while some of the scores were based 
on variability of response with the 
stimulus situation essentially un- 
changed, other variability scores 
reflected the difference between re- 
sponses to similar but clearly non- 
identical stimuli. Also, they utilized 
several different types of scores, such 
as average deviation, coefficient of 
variability, etc. For these reasons, 
their data are difficult to interpret. 

The average intercorrelation among 
11 rank orders on variability was .02. 
They concluded, however, that there 
was evidence for general variability 
and specific variability components. 
The first correlated .26 with their 
Emphasis factor and the second 
correlated .38 with Centrifugality. 

On the other hand, the data pub- 
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lished by Hunt (116) show that for 
both normals and_ schizophrenics, 
the rank orders of individuals on 
variability over 15 trials five days 
apart maintain consistency 
from test to test. (For each group, 
our analysis of his data yielded a 
chi square from the analysis of vari- 
ance of ranks with a p value less than 
10.) 

Set toward task. Up to this point, 
we have assumed that the task and 
the individual’s orientation toward 
it are constant. Some studies, how- 
ever, suggest that the individual's 
set toward the task may change as 
the task proceeds (73, 120), and that 
differences in set may affect variabil- 
icy. 

Abelson (1) reports that the indi- 
vidual variances tended to decrease 
during the course of the first session 
but tended to increase during the 
second session. He interprets the 
first trend as due to accommodation or 
learning and the second as due to 
boredom. 

Changes in set occurring later in 
a series of tests may also affect vari- 
ability. For rats trained (with food 
reward) to make brightness discrimi- 
nations in a maze with a variety of 
indirect paths, Maier (146) noted 
tendencies for more errors in later 
trials on the same day and on later 
days for the same pattern. He at- 
tributed the errors to inattention 
rather then ignorance. Variability 
increased when the correct path was 
left unchanged for 


some 


several days. 


Taylor (207) reported a study by: 
Danziger in his laboratory in which‘ 


the variability of the autokineti 
effect remained unchanged over time 
for volunteer subjects but decreased 
markedly for paid subjects. In Phil- 
pott’s work (172), any tendency 
toward increase in cycle length 


might well be due to fatigue or bore 
dom (cf. also 127). 
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The problem of set in relationship 
to the variability of schizophrenics 
will be discussed in the section on 
“Personality types and diagnostic 
categories.” 


Summary 


The measurement of variability 
involves a number of problems. Some 
of them are similar to those present 
in conventional psychological meas- 
urement which emphasizes level 
scores. Other problems are peculiar 
to the study of variability. We must 
guard against implicitly carrying 
over the assumptions for level scores 
as we explore this less familiar area. 

Because research on_ variability 
has been scattered and unsystematic, 
a number of different measures have 
been tried, each implying its own 
concept of variability. We are not 
yet in a position to decide upon the 
ideal measure (if one exists). 

Are measures of variability within 
one occasion consistent or stable over 
time? The answer appears to be a 
qualified one: under highly constant 
conditions, variability measures with- 
in different occasions may agree well; 
however, variability measures are 
readily affected by the set of the sub- 
ject toward the task, and therefore 
by changes in such set. On the other 
hand, measures of variability over 
occasions may be reasonably stable. 


STUDIES OF SPONTANEOUS 
(APERIODIC) VARIABILITY 


In this section we shall consider 
studies reporting variability which fits 
closely to our basic paradigm (Type 
I, spontaneous variability). While 
such variability has been observed 
in a wide range of behavioral re- 
sponses, it has usually been classified 
as error variance without further 
interpretation. It is our thesis that 
this type of variability is a lawful 
characteristic of an individual in a 
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situation, and that its investigation 
will enlarge our understanding of 
behavior. 

The need and the capacity for 
variation have been distinguished. 
Many years ago, Tolman empha- 
sized the initial exploratory tenden- 
cies in the rat (216) and creative 
instability, the capacity to break out 
into new lines of activity (217). 
Hilgard (108) points out that vari- 
ability may mean a need or prefer- 
ence for variety and also an ability 
to vary behavior in a given situation. 
Both meanings refer to variables 
underlying the observed phenomena. 
The need for variety involves a 
reaction to previous stimuli or re- 
sponses (cf. Type II). In our ap- 
proach, we assume that the potential- 
ity for changing a response always 
exists. Maier (147) also mentions a 


need for variability and argues that 
in some instances (e.g., operated 
rats), the capacity to vary behavior 


exists but is not used. In a later 
paper, with Schneirla (149), he again 
considers the tendency to vary be- 
havior. Mowrer and Jones (159), 
explored variability as a function of 
the effort involved in the responses. 
In Cattell's formulation (25, p. 635), 
there is a “law of dispersion with 
excitement and deprivation. Con- 
tinued stimulation of ergs, with dep- 
rivation of the goal, produces in- 
creasing variation in the stimuli to 
which attention is directed and in- 
creasing variation of response be- 
havior (as well as introspectively, 
increased ‘excitement’)."’ 

In his formal theory, Hull (113, 
pp. 304-321) gave explicit recogni- 
tion to the problem of variability 
by his concept of behavioral oscilla- 
tion. This oscillation was postulated 
to be specific to each reaction po- 
tential, i.e., the oscillations of differ- 
ent reaction potentials are asynchro- 
nous. He also called attention to other 
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kinds of variability: ‘““The ‘constant’ 
numerical values appearing in equa- 
tions representing primary molar be- 
havioral laws vary from species to 
species, from individual to individual, 
and from some physiological states 
to others in the same individual at 
different times, all quite apart from 
the factor of behavioral oscillation 
(sOr)”’ (114, p. 117). Subsequently, 
he considered oscillation in relation 
to conflict situations, behavioral in- 
consistency in evaluative choices, 
and alternation tendencies (115). 
Taylor (207, 208) discussed the con- 
cept of behavioral oscillation and 
argued that it is related to strength 
of drive. 

Statistical learning theory has rec- 
ognized the problems of stimulus and 
response variability (Estes, 61; Estes 
and Burke, 62). Their formulations 
may lead to a model that can be 
fitted to the intraorganic determi- 
nants of response variability. 

Psychophysics would seem at first 
thought to be an excellent source of 
data fitting our Type I model of 
variability, since many of its methods 
involve repetitions of the same stimu- 
lus. However, most of the research 
in this area is normative—it is con- 
cerned with general functions, not 
differences between individuals. Thus 
Guilford (88) neglects differences in 
intra-individual variability. Thur- 
stone (215) points out that the slope 
of the psychometric function for the 
constant method indicates degree of 
sensitivity. Bevan and Dukes (11) 
show that smaller average errors are 
obtained in judging the distance of a 
more valued stimulus object, and 
Dukes and Bevan (57) found less 
variability of response in comparing 
two positively-valued objects than 
in comparing two negative objects 
or a positive and a negative object. 

Accuracy of performance. Some of 
the first work (e.g., 210) was on vari- 
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ability in accuracy of performance. 
For several studies of repeated test- 
ing, Thorndike (211) studied the 
distribution of the individual’s de- 
viation from his own average. Since 
this study utilized alternate forms of 
the tests, it does not fully fit the 
paradigm (cf. 213). 

Fliigel (71) studied intrasession 
oscillation and day-to-day variability 
in continuous addition problems. 
Absolute oscillation was measured by 
the sum of changes between succes- 
sive brief periods and absolute vari- 
ability was based on deviations from 
the mean of the five sessions centered 
around a given day. Using both 
absolute and relative measures, he 
found that oscillations and variabil- 
ity were positively associated. He 


points out that in his data, oscilla- 
tions usually represented dips from 
a relatively steady rate, rather than 
spurts of faster performance. 
Hertzman (107) developed a vari- 


ability score based on deviations 
from each subject’s median on the 
Thurstone Substitution Test. For 
a group of highly variable subjects, 
the size of the intercorrelations be- 
tween variability scores for different 
trials was a function of proximity 
in time. A similar but less marked 
trend was found in the low varia- 
bility group. 

Variability in hand-arm steadiness 
within one session was studied by 
Lovell (142). Three measures were 
used: average deviation, relative 
variability (average deviation di- 
vided by mean), and sum of succes- 
sive differences between pairs of 
trials. For all three, substantial cor- 
relations were obtained between 
scores for two sessions a month apart. 
The values ranged from .49 to .74, 
with those for relative variability 
being the lowest. (Scores were com- 
puted for each of three time intervals: 
1, 3, and 12 seconds.) 
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Jarrett (122) reports individual dif- 
ferences in proportions of responses 
changed on a multiple-choice test. 

Output or magnitude of response. 
The well-known work of Dodge (54) 
employed a series of measures at 
different levels of neural integration. 
Darroch (44) reported daily fluctua- 
tions on a perseveration test (where 
the score is actually based on the dii- 
ference between performances on 
two tasks). Johnson (124) found wide 
individual variation in finger pres- 
sure, 

Marked individual differences in 
variability of judgments about a 
series of colors were reported by 
Hunt and Flannery (117). Variabil- 
ity increased with the number of 
colors to be judged and the number 
of categories used, but decreased 
as the number of repetitions mounted, 
David and Rabinowitz (45) de- 
veloped an instability score based on 
changes in preference for the Szondi 
pictures. Our analysis of their data 
shows substantial correlations be- 
tween instability scores for the eight 
‘factors’ in the test, the average 
intercorrelation being .42 for nurses 
and .62 for paranoid schizophrenics. 
(These are somewhat spurious: be- 
cause it is a forced-choice test, a 
change in preference for one picture 
must produce a change in preference 
for some other picture.) 

Several studies of the reliability 
of paper-and-pencil inventories have 
data on variability. Substantial in- 
ternal consistencies were found by 
Lentz (138) for number of items 
changed and for change in total score, 
these two measures correlating .40. 
As we might expect, certain items 
contributed more changes than 
others, especially the more general 
and abstract ones. Glaser’s criticism 
(78) that the relationship between 
level scores and change scores is a 
function of the range and design of 
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the test may be applicable to these 
findings (cf. 163, 173). 

Variability has also been noted by 
students of personality. The relia- 
bility of personality measurements is 
a function of the stability of the 
individual personality, according to 
Maller (150). Roshal (185) concludes 
that successful therapy is associated 
with gains in behavior variability 
which contributes to adaptability. 
Rapaport (176) observes that ‘‘vari- 
ability of reactions” follows from ‘‘the 
multiple determination of psychic 
events."’ Saudek (189, p. 62) dis- 
cusses changes of pressure in hand- 
writing. Spontaneous changes in 
characteristics of responses within 
one Rorschach examination are dis- 
cussed by Beck (9). In a study of 
changes in responses to a modified 
Rorschach test, Siipola et a/. (196) 
found that almost half the responses 
were changed more or less. Allen 
et al. (2) report that two-thirds of the 


responses were changed. Gibby (75) 
studied the stability of intellectual 


variables for repeated Rorschach 
tests. Fluctuations from test to 
retest on the Bender-Gestalt are re- 
ported by Pascal and Suttell (165), 
who suggest that attitude may be the 
explanation. It seems reasonable 
to hypothesize that some variability 
can be accounted for by change in 
attitude or set; in fact, it is possible 
that differences between individuals 
with respect to variability may be a 
function of differences in the strength 
of a set and in the capacity or need to 
maintain it. 

Other work. Several studies cannot 
be grouped into the categories used 
above. The variation in response 
pattern shown by mammals when 
confronted with an unsolvable task 
is more a characteristic of the indi- 
vidual than of the species, according 
to Hamilton (96). 

Variability of 


the instrumental 
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behavior in guinea pigs was studied 
by Muenzinger (160), who noted 
that plasticity remained even after 
one thousand trials in a puzzle box. 
Some animals would repeat one move- 
ment for a large number of trials and 
then shift to another one. In a sub- 
sequent study (Muenzinger, Koerner, 
and Irey, 161), animals which were 
required to make a specific movement 
in order to escape showed variation 
in the required movement but also 
showed a greater number of accessory 
movements. 

Guthrie and Horton (90) observed 
the behavior of cats in repeated 
escapes from a puzzle box. They 
concluded that the cat tends to re- 
peat his previous response. Inspec- 
tion of their photographic records, 
however, indicates marked differences 
between the cats in their degree of 
consistency. 

Rimoldi mentions a _ relationship 
between mean reaction time and 
variation: ‘‘a small coefficient of vari- 
ation and high speed tend to go to- 
gether, while in slow individuals the 
variability may be either high or low” 
(181, p. 298). (Cf. also 136, 146.) 

The studies discussed above have 
demonstrated variability in perform- 
ance with respect to accuracy, magni- 
tude, and rate, and have illustrated 
variability in instrumental acts. The 
evidence justifies the authors’ view 
that significant components of such 
variability can be isolated and inter- 
preted. 


STUDIES OF SYSTEMATIC 
VARIABILITY 


In this section we shall consider 
investigations using repetitions of 
the same stimulus in the same situa- 
tion where the responses show no 
monotonic trend but where some 
regularity or order is noted (Type II 
variability). Thus studies involving 
improvement and deterioration of the 
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quality, strength, or rate of response 
(e.g., due to learning or fatigue) will 
be omitted, but we shall include data 
that are viewed as a stationary time 
series, where the starting point is 
immaterial. We shall omit references 
to phenomena showing fixed cycles 
such as diurnal fluctuations. 

Most examples of systematic vari- 
ability can be classified as reactive 
variability—the subject responds not 
only to the re-presented stimulus but 
also to the previous presentation or 
to his previous response to it. In a 
discussion of difficulties in using re- 
peated measurements, Smith (198) 
points out that successive responses 
depend more or less on preceding 
stimulus-response situations. The 
problem is made more complex by 
the fact that the extent to which a 
response is influenced by preceding 
responses varies throughout the series, 
perhaps even from one response to 
the next. Johnson (according to 
London, 141) also argues that the 
state of the organism is altered by 
the previous response. The non- 
independence of successive responses 
in measuring visual threshold was 
reported and analyzed by Verplanck 
et al. (221). Wertheimer (226) cor- 
roborated this result and also found 
significant interaction variance 
tween days and subjects. 


be- 


Serial Order 


Variability measured by differences 


between successive responses falls 
somewhere between the major Type 
I and Type II classifications. Since 
the order of the responses is con- 
sidered, it does not meet a rigorous 
interpretation of the first category. 
On the other hand, it does not imply 
the degree of regularity found in the 
reactive or periodic forms of system- 
atic variability. 

Philip (171) had twelve subjects 
tap alternately on two plates at 
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maximum speed until exhausted. In 
seven subjects he found evidence 
for periodicity, the periods being 
about 50 minutes in length. The 
periodicity was measured by three 
methods—one of which was the non- 
randomness of distribution of the 
most frequently occurring intervals 
between taps. He distinguished 
between periodicity (which may be 
an efficient mode of operation be- 
cause it provides opportunity for 
recovery) and variability due to 
spurts and to the effects of distrac- 
tions. 

Abelson (1) applied spectral analy- 
sis to performance on _ repetitive 
tasks. This technique assesses the 
components of the output curve 
without assuming periodicity. Using 
an analogy from engineering, he 
asked whether individuals showed 
differences in degree and type of 
‘“‘out-of-controlness.”’ (His study 
does not completely fit our paradigm 
since his stimuli had some variety, 
but he assumed with some justifica- 
tion that the analysis could disre- 
gard these differences.) 

The task was a perceptual-motor 
one: making 100 jabs at each of five 
targets. The technique permitted 
recording the performance with the 
subject having only minimal knowl- 
edge of results. 

Abelson obtained repeat measure- 
ments on some subjects. As one might 
expect, the repeat reliability of vari- 
ability measures for the first task 
was much lower than for the suc- 
ceeding four tasks. While the vari- 
ance for each individual and the 
value derived from spectral analysis 
were uncorrelated with each other, 
each showed stability from test to 
retest. It may be worth noting that 
the retested group were volunteers; 
as a group, they showed on their 
first test a higher level and a greater 
scatter of their individual variances 
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than did the group not volunteering 
for retest. 


The Conceptualization of Systematic 
Variability 

The most prominent figure in the 
history of work on variability is 
Raymond Dodge. He was one of the 
few people to study the general 
problem. He pointed out that vari- 
ability could be viewed as accidental 
deviations from some abstracted 
“‘true’’ measure, but that such an 
approach with its emphasis upon 
invariants might overlook significant 
aspects of behavior (52): 

The psychophysical organism is in a per- 
petual state of flux. . . . Moreover, the neuro- 
muscular consequences of two successive in- 
stances of stimulation with physically similar 
stimuli vary not only according to the momen- 
tary conditions of the organism and its psy- 
chophysical set, but also according to inner 
reactions and inhibitions. As is well known, 
the repetition of identical stimuli may not 
evoke the same reaction in successive instances 


(55, p. 5). 


Dodge first published his two laws 


of relative fatigue in 1917. ‘‘Within 
physiological limits, all fatigue dec- 
rement in the results of work is rela- 
tive to the intensity of the stimulus”’ 
(51, p. 102). ‘“‘In any complex of 
competing tendencies, the relatively 
greater fatigue of one tendency will 
tend to eliminate it from the com- 
petition in favor of the less fatigued 
tendencies’”’ (51, p. 105). Recogniz- 
ing the phenomenon of avoidance 
of repetition of a response, he sug- 
gested the possibility of making use 
of it to study the nature of inner 
stimuli. He points out that relative 
fatigue should be viewed positively, 
as a conserving mechanism helping 
to prevent exhaustion. 

Dodge (53) collected voluminous 
data on several levels of neuromuscu- 
lar processes, for one subject. The 
records extended over two years and 
were taken at various times of day, 
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etc. Unfortunately, Dodge attempted 
to interpret his data by a series of 
analogical constructs derived from 
contemporaneous neurophysiology. 
Thus he extended the neural concept 
of refractory phase to reduced re- 
sponsiveness persisting for minutes 
and even years. 

Solomon (201) notes that Hunter, 
observing a tendency to alternate 
(118), assumed it was innate (119). 
Hull (113) utilized “reactive inhibi- 
tion’”’ in his system, and Underwood 
(220) makes this concept the basis of 
his explanation of alternation. Roth- 
kopf and Zeaman (186) suggest that 
the alternation tendency has a large 
‘response’ component and two small 
“place’’ components. Hebb (103, 
p. 228) has also suggested an inter- 
pretation of this phenomenon. 

Two papers by Glanzer (76, 77) 
promulgate the provocative concept 
of stimulus satiation and give experi- 
mental evidence to support it. His 
basic assumption is that with con- 
tinued exposure to the same stimuli 
in the same environment, the organ- 
ism becomes less active. Every 
moment the organism perceives a 
stimulating object, there develops a 
quantity of stimulus satiation to it. 
Glanzer maintains that the same 
principle holds for all multiple- 
choice and free response situations. 

To support his theory, Glanzer 
provides a telling critique of response 
theories which are based on the avoid- 
ance of the last response. Most of 
the studies in this area can be inter- 
preted by either theory because it is 
difheult to determine whether the 
stimulus or the response is being 
avoided. There are few crucial ex- 
periments other than Glanzer’s own 
work. Although Glanzer does not 
attempt to do so, it may be possible 
to adapt his theory to apply to verbal 
responses. While the nonrepetition 
of a recent verbal response might 
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seem to be a case of reactive inhibi- 
tion, it could be explained as the 
avoidance of stimulus satiation. If 
the organism is repeatedly exposed 
to the same stimulus in the same en- 
vironment, and if it is forced to re- 
spond {i.e., if it cannot become less 
active), it will introduce variety in 
its responses. It is obvious that non- 
repetition of response will reduce 
boredom. 

In this connection, an early study 
by Robinson and Bills (182) is per- 
tinent. On the basis of introspective 
reports, they conclude that the rapid 
repetition of homogeneous responses 
succeeds best without full attention; 
subjects were most efficient when 
they were having “concrete fan- 
tasies.”” 


The Nonrepetition of the Preceding 
Response 


A large number of papers deal 
with the relationship between suc- 


cessive pairs of responses. In a situ- 
ation with repetitions of the same 
stimulus and with two or more com- 
parable alternatives, is there a tend- 
ency for a response to be different 
from the preceding one? 

Motor responses. Lewin (139) re- 
ports an experiment where, with some 
pressure on them to continue as long 
as possible, subjects drew repeated 
moonfaces. While morons either 
drew them, or broke off to pause or 
to do something else, normal sub- 
jects used secondary activities and 
other means to keep going. 

Most of the work on alternation 
of motor responses has been done 
with rats (see reviews in 115 and 
201). Yoshioka (230) reported that 
only a small proportion of his ani- 
mals distributed their responses even- 
ly between two equally good paths 
to a goal, and these showed alterna- 
tion tendencies. MacGillivray and 
Stone (143) concluded that the tend- 
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ency to alternate responses is much 
stronger than the tendency to repeat. 

Wingfield and Dennis (227) report 
91 per cent alternation for rats given 
two trials a day but only 68 per 
cent for six trials a day. Dennis 
later (48) concluded that alterna- 
tion effects were due not to a tend- 
ency to alternate direction but to a 
tendency to avoid a specific pathway 
already taken. However, this latter 
tendency was not found in mazes 
with more than two choice points. 

Leeper (137) trained rats to choose 
one path when hungry and the 
alternative path when thirsty. In 
considering responses on the second 
trial cn each day, he invoked the idea 
of a “systematic tendency to vari- 
ability.” 

Further light was thrown upon 
spontaneous alternation by Heathers 
(102) who found that the percentage 
of alternation decreased as the time 
between trials increased from 15 to 
120 seconds; when the interval was 
15 minutes, alternation disappeared. 
Weitz and Wakeman (225) reported 
that alternation decreased to a mini- 
mum at intervals of 40 to 50 seconds 
and then rose with longer intervals. 
Alternation was noted by Riley and 
Shapiro (180) for trials 25 seconds 
apart (but not for trials 5 minutes 
apart). This tendency declined as 
the trials continued. 

After Solomon's extensive survey 
of ‘“‘The Influence of Work on Be- 
havior,”” he concludes that ‘‘work 
acts to produce negative motivation”’ 
(201, p. 35). Solomon (200) was 
able to increase the percentage of 
alternation to a maximum of 90 
per cent by making rats go up an 
inclined ramp (according to Zeaman 
and House, 232). Rothkopf and 
Zeaman (186) report that alternation 
was increased by more forced trials 
and also that it increased as the 
series of daily trials continued. In 
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a very recent study using meal 
worms (85), Grosslight and Ticknor 
found that alternation is increased by 
a forced choice and by the summated 
effect of two preceding turns, but is 
decreased by a longer distance from 
the last choice point. 

A series of experiments on spon- 
taneous alternation has been re- 
ported by Montgomery (152, 153, 
154, 155, 156). He believes that 
spontaneous alternation may be a 
special case of exploratory tendency. 
The exploratory tendency is reduced 
‘by the exposure to a place (e.g., an 
arm of a maze), thus leading to an 
explanation based on place avoid- 
ance, not response avoidance. This 
point of view resembles Glanzer’s 
concept of stimulus satiation (77). 

Montgomery concluded that “in 
simple maze-situations, amount of 
exploratory behavior decreases as 
time of exploration increases, and 
increases proportionally as the area 
available for exploration increases”’ 
(154, p. 584). The same tendency 
reappeared on each day’s trial (155). 
In both studies, the proportion of 
alternations in sequences of locomo- 
tor responses was above chance. 

In another study (153), the per- 
centage of alternation declined as 
the interval between trials was 
lengthened. Within each block of ten 
trials, the alternation rate was con- 
stant from the first half to the second. 
The effort required to press the bar 
did not affect alternation. The dis- 
crepancy between this finding and 
Solomon's (see above) may be due 
to any of several factors, such as the 
kind of work involved. 

One group of papers has empha- 
sized the absence of variability. 
Hamilton and Ellis (98) concluded 
that normal rats showed persistency 
in seeking a goal, but variability in 
their behavioral activities, whereas 
operated rats showed behavioral con- 
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stancy—a relatively fixed sequence 
of behavioral acts functioning as a 
unit. Hamilton and Krechevsky 
(99) demonstrated that a shock ad- 
ministered just before the rat reached 
the choice point was associated with 
a reduction in variability and a fixa- 
tion on one choice or the other. In 
several experiments, the reduction of 
variability by shock and by conflict 
was found by Everall (63) who noted 
a tendency for the same rats to per- 
severate under the different condi- 
tions. Krechevsky published three 
papers on ‘“Brain-Mechanisms and 
Variability.”” In the first (132), he 
found that normal rats used more 
different paths to reach a goal and 
shifted the path used more often than 
did operated rats. In the latter 
group the size of the lesion was nega- 
tively related to the number of paths 
used. Since he.did not count paths 
on which rats made errors, part of 
these findings may be a result of the 


fact that the operated rats made six 
times as many errors as the normal 


rats. In the second study (133), 
normal rats preferred a longer path 
of varying shape to an alternative, 
shorter path of fixed length, but oper- 
ated rats did not. On the other hand, 
in an experiment where the varying 
path was much shorter, no significant 
difference was found between the two 
groups: both chose the varying, shor- 
ter one only slightly more than half 
the time (134). Krechevsky con- 
cluded that operated rats can show 
variability of response when there is 
a large difference in the efficiency of 
the alternatives. However, these are 
group averages: the operated rats 
showed a greater tendency to per- 
severation, to repeat their last re- 
sponse, especially in the second 
experiment (cf. 147). 

Verbal responses. Most studies of 
patterning in the response sequences 
of human subjects have used guesses 





INTRA-INDIVIDUAL RESPONSE VARIABILITY 


or psychophysical judgments. Since 
the literature in this area has been 
reviewed recently (194, 201), we 
shall not attempt a complete cover- 
age here. 

The nonrandom patterning of ver- 
bal responses was noted and discussed 
by Thorndike (212) and by Dodge 
(54). In his intensive study of one 
subject, Dodge (53) included vocal 
reaction time to a 
presented in random order. 
written associations, Telford 
found less repetition with 
time intervals between responses.) 
In a study of speed of naming colors 
Bills (12) believed he had evidence 
for blocking. Abelson (1) reanalyzed 
Bills’ data and found no rhythmicity. 

In a series of five choices between 
two alternatives, Goodfellow (83) 
found a tendency to avoid symmetri- 
cal patterning. Skinner (197) argued 
that Goodfellow’s data could best 
be explained as tendency to alternate 
which is strengthened if the two 
preceding responses show repetition, 
and is weakened when they show 
alternation. Solomon (202) found 
that alternation tendencies in guess- 
ing were unaffected by the interval 
between guesses or by the effort 
required to record their guesses. 

In a psychophysical experiment 
using the method of constant stimuli, 
a tendency to avoid repeating the 
preceding judgment (especially the 
judgment of equal) was noted many 
years ago by Fernberger (69). A 
similar tendency was reported by 
Turner (218), by Arons and Irwin 
(6), and by Irwin and Preston (121). 
(Cf. 198, discussed above.) 

Day (46, as reported by Abelson, 
1) studied patterns in differential 
threshold responses to auditory stim- 
uli and found long successions of 
correct discriminations and of failures 
to detect differences. From his re- 
analysis, Abelson suggests the inter- 
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pretation that the probability of de- 
tecting a difference is greater when a 
preceding difference has been heard 
and less when the preceding change 
was not heard. 


Other Work on Systematic Variability 


There are several related areas 
which will be mentioned but will not 
be comprehensively reviewed. 

Periodicity. The search for peri- 
odicity in performance has a long 
history. As early as 1905, Seashore 
and Kent (193) reported periodic 
waves in continuous mental work. 
Sarvis (188) concluded that rhythms 
were ascertainable in the time re- 
quired for continuously tracing mazes 
blindfolded. However, he did not 
solve the problem of objective tests 
for rhythmicity: he held that ‘‘pro- 
longed experience’’ was necessary in 
making judgments about the pres- 
ence of rhythms. 

Philpott has been concerned with 
this problem for more than twenty 
years. In an early monograph (172), 
he provided a history of older work 
on curves of output and attempted to 
demonstrate geometric periodicity. 
He has recently sought to relate his 
psychological constants to physical 
constants. Among those unconvinced 
by Philpott’s arguments and con- 
cerned about his failure to utilize 
statistical tests is Richardson (179) 
who tested a work curve considered 
representative by Philpott and con- 
cluded that its spikiness might be 
random. 

Oscillation. Spearman (203) pro- 
vided an extended discussion of oscil- 
lations in efficiency. He held that 
these are manifested in fluctuations 
of minimal sensory impressions, in 
fluctuations of mental output, and 
in rivalry such as reversible perspec- 
tives. He concluded that there is a 
general oscillation factor which can- 
not be explained by g or by persevera- 
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tion. Tussing (219) found that the 
rates of fluctuation of four illusions 
increase with physical fatigue. 

Fliigel’s work on oscillation is dis- 
cussed above. (Cf. also the section 
on Serial Order.) 

Vacillation. Another more or less 
tangential topic is vacillation in con- 
flict situations. A systematic dis- 
cussion is provided by Miller (151). 
His paradigm is based on the relation- 
ship between distance and strength 
of response tendency. When an ap- 
proach gradient and an avoidance 
gradient cross, vacillation of response 
occurs at the intersection. More 
vacillation is found in the stable 
equilibrium produced by two avoid- 
ance gradients than in the unstable 
equilibrium of two approach gradi- 
ents. More complex situations are 
also considered. 

Systematic covariation. The most 
extensive work on covariation is that 
of Cattell. He and his colleagues 


have reported several studies (27, 
28, 29) of P technique, the correla- 
tion of measures on the same indi- 
vidual made on a number of succes- 


sive occasions. Fluctuation of atti- 
tude seems to be associated with 
emotionality (low maturity). A cen- 
tral consideration in this work is the 
relative range of fluctuations in differ- 
ent functions and the influence of that 
range on the obtained relationships. 

Holt (110), following a suggestion 
from Horn (111), correlated self- 
ratings and Szondi scores for one sub- 
ject over twelve trials. A large num- 
ber of self-rating items had high 
correlations over trials with Szondi 
factor m. 

Some covariation tendency for 
psychometric tasks was noted by 
Asch (7); the scores of each individual 
tended to vary together over time, 
suggesting a general efficiency factor. 
Our analyses of his data reveal no 
relationship between the extents of 
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total variability on three different 
tasks. 

We have intentionally omitted any 
consideration of mood swings in 
both normal and pathological sub- 
jects. 

In summary, there is considerable 
evidence of the existence of more 
or less systematic variability. The 
most common finding is that among 
equally efficient response alternatives, 
a given response will differ from the 
immediately preceding one, and even 
from other very recent ones. This is 
particularly true where the response 
is a choice of means to the same goal. 
It also occurs among choices where 
knowledge of the accuracy of the 
choice is not available to the subject. 
There is no general agreement on the 
explanation for this reactive variabil- 
ity or other systematic variability. 

Type II variability can be viewed 
as a special case of Type I variabil- 
ity. Systematic variability occurs 
only when the alternative responses 
have comparable probabilities of 
occurrence and when the successive 
presentations of the stimulus are 
relatively close together in time. 


CHANGE OF RESPONSE WITH 
CHANGE IN STIMULUS OR 
SITUATION 


In this section, we shall consider 
Type III variability, which we de- 
fined earlier as variability in response 
with variation in the stimulus or in 
the situation. While this class is 
obviously very large, some of the 
research in this area has implications 
for our central problem. Of course, 
all variability is of this type if one 
takes Guthrie's position (89) that the 
exact situation is never repeated. 


Change in Situation Primary 


Variability in this category is 
basically the difference between two 
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responses to the same stimulus in 
two different situations. 

A great deal of psychological re- 
search utilizes the basic design of 
comparing responses under two dif- 
ferent experimental conditions. The 
interest is generally in the mean 
change in total score or average per- 
formance rather than in absolute 
change on separate items. Further- 
more, the emphasis is usually on the 
group change, rather than on the dis- 
tribution of individual change scores. 
For example, Johnston (125) reports 
low but generally positive relation- 
ships between measures of adjust- 
ment and relative gain from test to 
retest when the retest was adminis- 
tered under stressful conditions. An 
illustration of a change score based 
on absolute changes in responses to 
items is Brownfain’s study (20) of 
the stability of the self concept under 
varied instructions. 


Change in Stimulus Primary 


While the concept of variability 
is closely related to the concept of 
rigidity, they can be distinguished 
by two differences in emphasis. In 


the usual paradigm for studying 
rigidity, the subject is presented with 
first one stimulus and then another, 
objectively different one, with the 
general situation more or less con- 
stant. The focus is on the extent to 
which the subject fails to change his 
response, i.e., upon the degree of 
invariability. In approaching vari- 
ability, on the other hand, the more 
typical study keeps both the stimulus 
and the situation constant and meas- 
ures the degree of variability, which 
may be excessively small or large. 
The second difference is in normative 
emphasis. Rigidity is conceived to 
be maladaptive, and hence as the 
opposite of adaptive, appropriate 
response tendencies. In studying 
variability, the adaptiverness of the 
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change in response is secondary. 
Change may or may not be con- 
sidered desirable. On the one hand, 
fluctuations in quality or accuracy 
may be viewed as undesirable; on 
the other hand, the tendency to 
make use of more than one of several 
equally efficient alternative responses 
may be beneficial because it sustains 
efficiency by preventing boredom. 
Thus the problem of variability en- 
compasses and is more general than 
the problem of rigidity. 

Krechevsky and Honzik (135) 
found that individual rats which con- 
sistently chose the shorter of two 
paths in a maze could change, when 
the paths were interchanged, as 
rapidly as those showing more vari- 
ability on the first problem (cf. also 
131). In a study of human problem 
solving, Guetzkow (86) concluded 
that there are two distinct factors in 
set: susceptibility to set (tendency 
to acquire a set readily) and ability 
to surmount an acquired set. Church- 
man (33, p. 240) conceives of person- 
ality ‘“‘as the measure (or measures) 
of typical inefficiency an individual 
displays in problem-solving,’ due to 
his failure to drop an old method or 
his tendency to change to a new, less 
efficient method. 

Cattell and Tiner (30) have pro- 
vided a review of the literature on 
perseveration and a factorial study 
of structural rigidity. Most of their 
tests involved capacity to perceive 
stimuli in new ways, not tendency 
to respond differently. A further 
analysis of structural rigidity has 
been made by Cattell and Winder 
(31), who distinguish between fluc- 
tuations in goals and fluctuations in 
goal paths. Kleemeier and Dudek 
(129) sought a general flexibility fac- 
tor but failed to find it. However, 
their flexibility tests had little homo- 
geneity. 

A major monograph on rigidity is 
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that by Fisher (70). For him, a meas- 
ure of rigidity is the number of equiv- 
alent alternatives which the subject 
demonstrate, in some _ behavioral 
way, that he can utilize. Once again, 
the emphasis is upon modes of re- 
sponse which are possible for the sub- 
ject, not upon variability in actual 
responses. Thus he used the number 
of objects liked in a given set of stimu- 
li, the number of possible alternatives 
accepted by the subject, etc. Several 
of his suggestions concerning rigidity 
may have analogues in variability: 
for example, his discussion of indi- 
vidual consistency of rigidity mani- 
fested on tasks of a given difficulty 
level as related to adjustment, and 
his distinction between inner rigidity 
when self-esteem is threatened and 
peripheral rigidity when not emotion- 
ally threatened. 


RELATIONSHIPS BETWEEN VARI- 
ABILITY AND OTHER 
VARIABLES 
Some Experimental 
lated to Variability 


Variables Re- 
In this section, we shall review 
evidence concerning the relationships 
between variability and each of 
several kinds of variables. Some stud- 
ies involving intra-organic conditions 
will be discussed in this first part. 

Brain lesions appear to be associ- 
ated with reduced variability. Hal- 
stead (95) noted that the average 
deviation of trials on critical fusion 
frequency trials is lower for frontal 
lobectomy cases. Reference has been 
made earlier to several pertinent 
studies (98, 130, 132, 133, 134). 

The effect of shock on variability 
has been studied by several people 
(63, 99, 162, 187). Fairlie (66) dis- 
covered that shock at the choice 
point produced more fixation in rats 
entering the correct path than in 
rats entering an incorrect one. De 
Valois (50) offers the interpretation 
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that fixation occurs when the rat 
cannot avoid the punishing shock. 
After discussing the studies (126, 
128, 164) that show the fixation 
persisting after the shock is discon- 
tinued, he emphasizes Farber’s find- 
ing (67) that the persisting fixation 
is caused by anxiety: when the anxi- 
ety state is removed, the fixation is 
ended. 

De Valois’ own experiment (50) 
is a major contribution to this area. 
He showed that in rats, more intense 
motivation is associated with lower 
variability, for both an approach 
motive (thirst) and an avoidance 
motive (shock). Increasing the moti- 
vation decreased the variability, and 
decreasing the motivation increased 
the variability. The latter effect was 
demonstrated when the origina! moti- 
vation was moderately high but not 
when it was very strong. De Valois 
does not accept Maier’s position, 
which includes the principle (148, 


p. 159) that a problem situation pro- 


duces stereotyped behavior in a 
frustrated individual but variable 
behavior in a motivated one. De 
Valois’ results agree with Elliott's 
finding (60) that increased hunger 
lowered variability but that the rats 
remained at a low level of variability 
subsequently when they were less 
hungry. 

In human subjects, increase in 
motivation may increase variability. 
Deese and Lazarus (47) obtained 
greater variability of performance on 
a Rotary Pursuit Test by making the 
task more important and by inducing 
failure stress. These factors also in- 
creased the interindividual differ- 
ences in variability. A measure of 
the variability of reaction time for 
binocular fusion under conditions in- 
volving emotional stress may be 
useful in selecting emotionally stable 
people (Brown University, 19). Us- 
ing only 14 subjects, Baker and 
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Harris (8) found Rorschach correlates 
with an increase in variability of 
speech intensity under stress (threat 
of shock). It should be noted that 
these stresses used with human sub- 
jects are disruptive in part because 
no positive adaptation is possible. 

The picture at lower levels of 
motivation is less clear. J]. G. Taylor 
(207) suggested that strength of 
drive is related to amount of spon- 
taneous activity, one aspect of which 
is behavioral oscillation. Solomon 
(201) reported an increase in alterna- 
tion with increase in effort required. 
On the other hand, with sets of two 
equal paths, Mowrer (158, Ch. 6) 
working with Orbison found that the 
longer the route, the less the vacilla- 
tion. This finding may be a function 
of increased time between trials 
and/or reduced amounts of stimulus 
satiation effects (cf. Glanzer, 77); 
the increase in energy expenditure 
may be unimportant here. 

In an experiment by Goodman, 
Moyer, and Bunch (84), rats were 
exposed to electric shock, air blast, 
or food deprivation being 
dropped into water. These various 
conditions did not produce differences 
in variability of alley chosen to get 
out of the water, perhaps because the 
conditions were not sufficiently stress- 
ful. 

Several studies have noted changes 
in variability, both within one oc- 
casion and over occasions (1, 94, 159, 
207). 


before 


Personality as Related to Variability 


Personality traits. Hall (94) found 
that rats showing more variability 
among five alleys of equal length 
also showed greater emotionality (as 
measured by defecation in the ap- 
paratus) and took somewhat more 
time per run. Emotionality might 
be construed as an index of unreduced 
tension. 
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Frenkel-Brunswik (72) reported 
some provocative relationships be- 
tween trait ratings and fluctuations 
in ratings made semiannually. 

Cattell (23) confirmed his predic- 
tion that degree of change in senti- 
ments and attitudes would be nega- 
tively related to w, the general char- 
acter factor, as rated by peers. He 
explained the lower correlations of 
his measure of fluctuation with extro- 
version, emotionality, and mood 
swings in terms of the relationships 
between these variables and w. 
Fairly consistent results were ob- 
tained for children and for adults, 
using intervals of either one day or 
one month between the administra- 
tions of the inventories. The w factor 
has its highest correlations with small 
change in ‘‘deeper sentiments.’’ Simi- 
larly, Cummings (43) found that 
variability on self-ratings correlated 
negatively with persistence, w, and 
introversion. Subjects high on this 
kind of variability were rated by 
others as original, imaginative, and 
talkative; low subjects were regarded 
as conventional, thorough, and pug- 
nacious. Walton (222) demonstrated 
a relationship between steadiness of 
character and low oscillation on mo- 
tor and cognitive tests. 

From a study of weekly retests on 
the MMPI, Layton (136) concluded 
that variation is a function of the 
individual, not of his score relative 
to the group. Variation on single 
scales had no consistent relationship 
to mean score although some trends 
were in a plausible direction. 

Rosenzweig and Mirmow (184) 
found that degree of socialization 
was associated with trends on the 
P-F test, a trend being a shift from 
extrapunitiveness to impunitiveness 
during the testing. (This type of 
change is presumably due to the 
subject's reaction to his previous re- 
sponses. ) 
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Two papers (163, 173) correlated 
changes in questionnaire responses 
with test scores on adjustment. 
Glaser, however, has done a more 
thorough study (78) demonstrating 
that the correlations between level 
scores and consistency are zero for 
tests with ranges appropriate for the 
group tested. This criticism does not 
apply to Weber's study (224) in 
which variability of speed on psycho- 
metric tests was related to emotion- 
ality and submissiveness (as meas- 
ured by questionnaires.) 

Several factor analyses including 
variability measures have been made. 
Brogden (17) found nonvariability 
among 30 trials on addition had an 
appreciable loading on a factor in- 
volving ability to work steadily and 
to resist distraction. In his synthesis 
of factor studies using objective 
personality tests, Cattell (26) notes 
several factors on which variability 
measures have loadings. Connor 


(34), using measures of both daily 
variation and immediate variation, 


found little relationship between 
temperament and variability meas- 
ures from which ability had been 
partialed out. 

Personality integration. In several 
sources, we find the suggestion that 
personality integration is negatively 
related to variability (cf. 23 and 125, 
discussed above; also 109, 204). 

A provocative paper by Smith 
and Klein (199) indicates a relation- 
ship between variability of speed on 
the Stroop color-name test and a 
type of adaptability which is ac- 
companied by lability of control. 

In their review of the Szondi test, 
Borstelmann and Klopfer (16) eval- 
uate the proposition that the picture 
categories in which changes in prefer- 
ence patterns occur represent the 
more unstable areas of personality, 
whereas categories representative of 
stable, basic need systems will show 
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little change over time in selection 
pattern (cf. 49). Noting that David 
and Rabinowitz (45) found greater 
changes in choices for schizophrenics 
than for student nurses, Borstelmann 
and Klopfer conclude that variability 
in Szondi Test behavior occurs in 
the records of normal subjects and, 
to a greater extent, in the response 
of pathological subjects. For both 
groups, the ‘‘variability seems to be 
pervasive and not differential among 
test categories’ (16, p. 124). 

Personality types and diagnostic 
categories. Pauli utilized a variety 
of measures in his analysis of the 
curve of performance for continuous 
additions and subtractions (166). In 
a second paper (167), he concludes 
that to assess character and aptitude, 
the total number of items tried and 
the total number correct are suffi- 
cient; the range and average devia- 
tion are unecessary. Susukita (205), 
using Pauli’s methods, found he 
could distinguish between two Japa- 
nese character types: the inner- 
integrative or rigid had lower varia- 
tion in performance than the outer- 
integrative or labile. Variation was 
not related to age. Andé (4), how- 
ever, concluded that variability on 
psychophysical tasks could not be 
used for character diagnosis. Using 
instructions to sustain pressure on a 
dynamometer at a constant level, 
Eden (59) found that the graph of 
pressure varied for the different 
Jaensch types. In general, the J 
types had smooth curves while those 
for the S types were irregular. 

The variability of performance of 
normal and neuropsychiatric groups 
on various tasks has been compared 
by Eysenck and his colleagues (64) 
and by Roseman (183). 

The question whether psychotics, 
especially schizophrenics, are more 
variable than normals has been exam- 
ined in several papers. The earliest 
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study was made by Gatewood (74) 
who found that dementia praecox 
patients were more irregular than 
normals. He suggested that the dif- 
ference was due to the patients’ poor 
attention and their lack of “thought 
control.”’ Seeking to determine 
whether schizophrenics were more 
variable because of defects in ‘‘psy- 
chological government,’’ Hunt (116) 
obtained self-reports from his sub- 
jects. He found schizophrenics were 
more variable and that, within that 
group, output was significantly re- 
lated to type of preparatory set taken 
by the subject. The greater variabil- 
ity in output was associated with 
greater variability in set toward the 
task. Two other possible governing 
factors had little or no independent 
effect. 

A similar study was carried out by 
Huston, Shakow, and Riggs (120). 
Their schizophrenics had _ higher 
means and intra-individual variabil- 
ity in reaction time than their control 
group. Since the schizophrenics 
showed lower means and variability 
in their second and third testing ses- 
sions than in their first, the authors 
suggest that schizophrenics may have 
a slower rate of adaption. Coopera- 
tion and variability were negatively 
related, but even the more coopera- 
tive patients were more variable than 
the controls. A second experiment 
tended to confirm the hypothesis 
that schizophrenic patients do not 
attain as high a level of preparation 
or set, show more variability in 
height of preparation, and do not 
maintain their best level of prepara- 
tion as consistently as do normal sub- 
jects. An earlier study by Shakow 
and Huston (195) had shown greater 
variability for both schizophrenics 
and manic-depressives than for nor- 
mals on speed of alternate tapping. 

Using tests similar to those of All- 
port and Vernon (3), Wulfeck (229) 
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measured intra-individual consistency 
of performance by test-retest cor- 
relations between the second and 
third sessions. The average r for 
several tests was .81 for manic- 
depressives, .80 for normals, .75 for 
psychoneurotics, and .71 for schizo- 
phrenics. Schofield (191, 192) has 
examined the changes in MMPI 
responses following different thera- 
pies for normals, neurotics, and psy- 
chotics. Differences in the intra-indi- 
vidual variability of schizophrenics 
and controls with respect to physi- 
ological functioning are discussed by 
Hoskins and Jellinek (112). 

There have been a number of 
papers on consistency of intelligence 
test performance (e.g., 104, 174, 
175, 223). 


Variability or Variabilities? 


With almost no exceptions, each 
of the studies reviewed in this paper 
has examined variability on only a 
single measure. We have no definitive 
evidence yet on the generality of 
variability. What is the factor struc- 
ture of variability scores? Is there one 
general factor? Are there many 
common factors? 

There is reason to believe that 
variability will turn out to be a func- 
tion of the test stimuli and of the 
general situation. If so, the problem 
of the correlations between personal- 
ity and variability becomes more 
complex. We may have to deter- 
mine which personality traits are 
associated with each variability fac- 
tor or measure. 


SUMMARY AND CONCLUSIONS 


Intra-individual response variabil- 
ity refers to the change in an indi- 
vidual’s response from one time to 


the next. Variability under three 
broad classes of conditions has been 
considered: 

Type I. Spontaneous or aperiodic 
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variability. The individual is pre- 
sented with the same stimulus in the 
same situation at two points in time. 
It is assumed that the initial presenta- 
tion and the initial response do not 
affect the second response, i.e., that 
the order of the responses is imma- 
terial. 

Type II. Systematic variability. 
Although the stimulus and the situa- 
tion are again unchanged, the second 
response is influenced by the first 
presentation, by the first response, 
or by both. A primary example is 
reactive variability such as response 
alternation. Explicitly excluded from 
this type are (a) change showing a 
monotonic relationship with time 
(due to learning, fatigue, etc.) and 
(b) such periodic and cyclical phe- 
nomena as diurnal variation. 

Type III. Change in response 
associated with change in stimulus 
or in situation. This comprehensive 


class includes all conditions not men- 
tioned above. Only a few problems in 


this class, such have 
pertinence here. 

Since the pure case, Type I, ex- 
cludes all external determinants of 
change, variability must come from 
within the organism. We have, 
therefore, pointed out that fluctua- 
tions of intraorganic processes or 
states have been demonstrated or at 
least accepted by many writers. 

Variability has been measured in 
many different ways. Furthermore, 
different variability scores may be 
based on different time intervals be- 
tween the responses being compared. 
Measures of variability from one 
occasion have reasonable intercor- 
relations with each other and can 
have high stability over time, especi- 
ally over short intervals. In general, 
we may expect the correlation be- 
tween two of these measures to be a 
function of both the time interval 


as rigidity, 
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between them and the relative homo- 
geneity of the two tasks from which 
the scores were obtained. Such a 
score is itself much more subject to 
change over time than are conven- 
tional level scores. Variability meas- 
ures appear to be related to set to- 
ward a task and to degree of adapta- 
tion to the stimulus situation. 

Relatively little work has been 
done on Type | (spontaneous) varia- 
bility, perhaps because it is rarely 
seen in its pure form and because it is 
difficult to elicit in experimental 
situations. Nevertheless, it has been 
recognized in almost all kinds of 
behavior. 

Reactive variability (Type II) has 
been studied more thoroughly. Both 
motor responses in rats and verbal 
responses in humans show a tendency 
toward the nonrepetition of the 
preceding or very recent responses. 
It appears likely that the organism 
does not seek to avoid making the 
previous response but rather seeks 
to respond in such a way as to vary 
the total pattern of stimulation reach- 
ing it, including the stimulus pro- 
duced directly or indirectly by its 
own response. 

In examining some aspects of Type 
[1] variability, it was suggested that 
the concept of rigidity refers to vari- 
ability which is restricted or reduced 
to a level considered to be maladap- 
tive. 

Variability is probably decreased 
by shock and by very strong motiva- 
tion. Motivation at lower levels is 
less clearly or more complexly re- 
lated to variation in response, e.g., 
experimental stress may increase 
variability. 

The personality correlates of vari- 
ability, if any, remain to be estab- 
lished definitively. It is likely that 
variability is negatively related to 
persistence and ‘‘character.’’ Simi- 
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larly, we cannot state with assur- to variability. While sections of the 
ance the nature of the relationships general area have been investigated, 
few sections have been systematically 
This paper has reviewed many attacked. The phenomena require 
studies with more or less relevance further intensive studies. 


with neurosis and psychosis. 
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Perhaps the most telling sociologi- 
cal phenomenon in psychology is its 
rapid professionalization during the 
decade since the end of World War 
Il. Ina scant ten years, an essentially 
academic discipline has transformed 
large segments of itself into a form of 
public service, meeting in the process 


251 








252 


the problems with which a service 
profession must cope: relationships 
with other professions, the formula- 
tion of ethical codes, provisions for 
internal policing of professional prac- 
tice, the development of training 
standards and training opportuni- 
ties, concerns about legislative recog- 
nition, the maintenance of public 
relations, and many others. 

One problem that cuts deeper than 
others, however, is seldom faced 
squarely. Psychology’s professional- 
ization has occurred largely as a 
response to a social demand intensi- 
fied and given justification, although 
not created, by wartime experience. 
Such a demand takes primary form 
as a market for new workers to deal 
helpfully with the troubles of un- 
happy people. Since the knowledge 
necessary to this task has not yet been 
fully discovered and developed, a 
need is created for a kind of substitute 
for knowledge, a literature of practi- 
cal wisdom and summarized clinical 


experience. And because the human 
problems toward which professional 
psychology is directed are genuinely 


important and poignant, because 
something crucial in the lives of his 
clients is at stake, the professional 
psychologist himself often is moti- 
vated to seize desperately on any 
technique or idea that has the appear- 
ance of usefulness. When the chips 
are down, as they generally are in 
professional practice, skepticism 
about one’s own resources is a luxury 
that few can afford. 

One outcome of this state of affairs 
is the current rift between ‘‘pure”’ 
and “applied’’ psychologists. 
bly a bit jeslous of the social promi- 
nence and income of their colleagues 
in clinical and counseling positions, 
research men and serious theorists 
are likely to judge the literature and 
oral pronouncements in these fields, 
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often with some justification, as soft- 
headed and close to meaningless. On 
the other hand, made defensive by 
such charges and harried enough by 
their applied responsibilities, the pro- 
fessionals tend to reply in hasty 
irritation that the work of the scien- 
tists is unfeeling and irrelevant. 

Actually, of course, social need has 
always outrun available knowledge, 
generally providing the essential spur 
to its pursuit. To say that profes- 
sional practice must rely on hunch 
and accumulations of uncontrolled 
experience is to say nothing deroga- 
tory so long as one knows what is 
happening. The significance of the 
present split in psychology is that 
the element of faction makes it 
harder for people to acknowledge the 
difference between knowledge and 
practical approximations, to work in 
concert for the replacement of the 
latter by established fact and soundly 
based theory, and to formulate from 
professional practice fruitful hypothe- 
ses for the advancement of psy- 
chological science. 

The professionalization of psy- 
chology, then, provides not only 
opportunities for the discipline’s be- 
ing of service to a needful public but 
also opportunities for the enrichment 
of psychological science through the 
hypothesis-forming potential of ap- 
plied work. Its danger lies in a 
splitting of science and profession to 
the detriment of both. The extent 
to which this divisive tendency has 
permeated or been controlled in the 
field of counseling psychology and ad- 


‘justment can be estimated from these 


18 recent books. 


CONCEPTS OF ADJUSTMENT 


Willy-nilly, what counselors do in 
their relationships with clients is 
determined in part by their implicit 
or explicit notions of what constitutes 
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effective personal adjustment and 
their unformulated or articulate con- 
cepts of how it is attained. Most 
counselors, clinicians, and social psy- 
chologists find themselves involved 
in difficult and ongoing struggles to 
clarify such questions as the relation- 
ship of social conformity to the grati- 
fication of individual needs, of articu- 
lation with the cultural group to 
personal spontaneity and novelty, 
and of the acceptance of social regula- 
tion to the promptings of one’s indi- 
vidual experience conceptualized as 
conscience. 

Shaw and Ort treat these problems 
courageously and in a sophisticated 
fashion in a book that blends effec- 
tively clinical experience with re- 
search evidence and theories that 
have had some brush with the labora- 
tory as a proving ground. They argue 
that the individual’s adjustments are 
best understood as ways of interacting 
with other people and as products of 
primarily social experience. This em- 


phasis on one’s history and current 


behavior in interpersonal, social, 
and cultural contexts is formulated 
within a conceptual framework of 
reinforcement learning theory in 
which personality development and 
functioning is comprehended as the 
result of rewards and punishments 
experienced in the social learning 
environment. 

On the other hand, they are most 
mindful of the problem of adequacy 
in personal adjustment and wrestle 
manfully with the criterion of mere 
social conformity to which a sociologi- 
cal point of departure often leads. 
Their resolution of the difficulty is 
hardly a final one. Shaw and Ort 
rightly hold that the ability to act 
in accordance with social standards 
is an essential part of effective inter- 
action with one’s inescapable inter- 
personal world but that it must be 
balanced by something they call 
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“integration."’ This factor is defined 
as the maintenance of oneself in one’s 
environment, and behavior disorders 
are conceived as deviations from 
adequate integration. 

This concept involves two inter- 
locking difficulties. The first is the 
theoretical embarrassment that de- 
rives from the attempt to fit the no- 
tion of self-maintenance, deriving 
from a phenomenological tradition, 
into the objectivist behavior theory 
that is the conceptual groundwork 
for Personal Adjustment in the Ameri- 
can Culture. This task of unifying 
different psychological theories is a 
most worthy one, but the attempt 
here results in a weakening of the 
rigor of reinforcement theory and a 
loss of the intuitive impact that 
seems to have made phenomenology 
attractive for many applied workers. 
Second, the effort to deal with inte- 
gration apart from conformity lands 
the authors in the problem of ac- 
counting for integrative behavior 
that runs counter to their social ex- 
perience, of developing explicit cri- 
teria for judging when nonconformity 
represents adequate adjustment and 
when it suggests mere rebellion or 
lack of social awareness, and of pro- 
viding a psychological basis for under- 
standing integrative adjustments 
that lie outside cultural norms. 

Two comments seem in order. The 
first is that one of the sources of un- 
certainty and a kind of fuzziness of 
thought in much of applied psychol- 
ogy may be the necessity of involve- 
ment in questions which are essential- 
ly ethical or philosophical in char- 
acter. In a sense, Shaw and Ort ac- 
cepted the task of defining “‘the good 
life’ or ‘‘the elements of good con- 
duct." A. E. Taylor in his life of 
Socrates (7) makes much of the 
Socratic “sense of the importance of 
implicit obedience to lawful author- 
ity’’ without falling into the ‘‘vice of 
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exalting the mere letter of the law 
above its spirit.’’ It would seem that 
Shaw and Ort’s courageous book, for 
all its labors, says little more on this 
difficult topic of the balance between 
conformity and integration. There is 
a temptation to say that psychology 
should stick to its traditional last, 
but as May (5) and Mowrer (6) point 
out, the question of what kind of 
behavior is associated with human 
happiness is a legitimate one for 
science to ask and an inevitable one 
for the psychological professions. 

The second point to be made is 
that few books illustrate as well as 
Shaw and Ort’s how far a knowledge- 
able fusion of experimentally based 
theory and clinically based hunch can 
achieve two goals. One is the clarifi- 
cation of processes observed in the 
consulting room and in the field. 
Defensive operations, the phenomena 
of age grading and other forms of 
role taking, masculinity and feminin- 
ity, and occupational strivings be- 
come less complex and communicable 
in more manageable terms when ana- 
lyzed according to a theoretical sys- 
tem the strengths and weaknesses of 
which are assessable in part through 
laboratory tests. The other is the 
enrichment of a theoretical structure 
by seeing how well it can be made to 
cover practical observations. Even 
though it was written as a relatively 
low-level textbook, this volume will 
repay close reading by those inter- 
ested in developing research problems 
in that important area that lies be- 
tween restricted theory and loose 
practice. 

The volume by Katz and Lehner, 
on the other hand, is theoretically 
much less ambitious. Its pretensions 
lie in the direction of explaining 
human behavior to unsophisticated 
readers and of demonstrating to 
these readers how they may achieve 
happier adjustments in their own 
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lives. As such, Mental Hygiene in 
Modern Living is a kind of cross 
between text and self-help book. 
This hybridization could conceivably 
make for certain strengths. For 
example, the encyclopedic coverage 
virtually assures most readers of 
finding something useful in its pages. 
Like an encyclopedia, however, the 
book lacks system and thoroughly 
confuses practical wisdom with veri- 
fied knowledge and _ substantiated 
theory. Nowhere does it come to 
grips with the problem of what con- 
stitutes positive adjustment in any 
cogent way, and the discussion of 
maladaptive behavior, including the 
psychoses, leads one to ponder the 
observation that clinicians and coun- 
selors so often seem more at home 
with psychological ills than with 
psychological health. Even the 
technical vocabulary seems wanting 
in words to describe effective adjust- 
ment and the conditions which de- 
termine it. It is little wonder, al- 
though a trifle alarming, that so littie 
research is done and so little theoriz- 
ing devoted to the development of 
personal assets when professional at- 
tention is directed so strongly to 
pathology. One unfortunate result is 
that discussions of positive personal- 
ity growth have, as they do in this 
book, the leaden ring of clichés 
sounding through them. Saddest of 
all, this fault cannot be charged 
directly to the authors. Citing a 
wide literature, they refer to virtual- 
ly no studies directly concerned with 
psychological health; one guesses 
that they found none. For counseling 
psychology in particular, with its 
official emphasis (1) on the positive 
and the “normal,’”’ this absence of 
careful thought and empirical evi- 
dence on the problem of what con- 
stitutes normality and its determi- 
nants seems to be a deadly realm of 
ignorance. 
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Lindgren makes some attempt to 
deal with this worrisome gap in 
knowledge and theory in a readable 
book that pays little attention to the 
major forms of psychopathology and 
concentrates on the understanding of 
“ordinary people.’ His approach is 
not discernibly systematic, like Shaw 
and Ort’s, but neither is it a wide- 
ranging mass of unconnected proposi- 
tions and unorganized bits of infor- 
mation, like Katz and _ Lehner’s. 
Refreshingly, Lindgren recognizes hu- 
man resiliency, the capacity to absorb 
considerable amounts of frustration 
without serious damage, as a primary 
factor in mental health; and he also 
deserves much credit for bringing 
thinking back to the list of psy- 
chological functions of importance to 
counselors. Dynamic psychology, 
in its reaction to the rationalist 
tradition and its proper underscoring 
of affective and conative determi- 
nants of behavior, has forgotten until 


very recently Freud’s (4) remark, 
“The voice of the intellect is a soft 
one, but it does not rest until it has 


9 


gained a hearing.’’ It is to Lindgren’s 
credit that he gives due emphasis to 
the role of thought in solving com- 
mon human difficulties without fall- 
ing back on the notion that the prob- 
lems of people are always at bottom 
intellective affairs. 

Nevertheless, the emphasis on 
pathology that has characterized 
modern psychology, especially in its 
professional aspects, seems to pro- 
vide the undercurrent for this, as for 
other, volumes. The conception of 
man implicit here is one that depicts 
him as being buffeted by his circum- 
stances, protected only by his degree 
of native toughness and his wisdom 
in finding ‘therapeutic’ experiences 
either within or without the consult- 
ing room. Everyday social life, work, 
and religion (conceived naturalistical- 
ly as a form of ready communication 
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with a group and its leader) are all 
discussed as forms of therapy, i.e., 
as corrective. This view may be quite 
correct, but it is not unchallenge- 
able. One can legitimately raise the 
question of whether work and per- 
sonal relationships cannot have posi- 
tive effects in themselves, not as cor- 
rectives to unavoidable ills, just as 
one can ask if the inescapable frus- 
trations and compromises of daily 
living necessarily require therapeutic 
relief in mature people. This rather 
different idea of human functioning 
might lead to a re-evaluation of ex- 
periment and theory and the develop- 
ment of a fruitful new line of psycho- 
logical research on the nature and 
antecedents of normality. 

One might expect that these con- 
cerns for the nature of adequate ad- 
justments, involving something very 
close to ethical and philosophical 
considerations, might find more ex- 
plicit attention in two books written 
from religious points of view. 
Hoyles’ book on delinquency is by 
a minister of the Church of England, 
and Recktenwald’s discussion of guid- 
ance procedures is by a Roman 
Catholic. The first is a humane and 
intelligent argument for rehabilita- 
tive rather than punitive treatment 
of juvenile offenders. The second is a 
workmanlike and comprehensive 
brief overview of guidance tech- 
niques in school settings. Neither 
contains anything particularly novel, 
and Hoyles seems more preoccupied 
with presenting a challenge to the 
Anglican Church than with analyzing 
the character of criminal adjustments 
among young people. The important 
thing about them which justifies 
considering them together here is 
their mutual lack of attention to the 
problem of what constitutes ade- 
quate or integrative behavior. Ex- 
cept for a very few random pages, 
Recktenwald’s book could have been 
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written by a _ non-Catholic, and 
Hoyles uses the language of moral 
theology to say essentially the same 
things that anybody would say who 
conceives of delinquency as a form 
of behavioral pathology rather than a 
kind of wilful violation of social 
tenets. Neither grapples as frankly 


as do Shaw and Ort, for example, 
with the problem of adequate adjust- 
ment or the behavior patterns associ- 
ated with ‘‘the good life.’’ 


RESEARCH STUDIES 


If the nature of integrative ad- 
justment is given little attention in 
these books on mental hygiene, it 
is not surprising to find a similar lack 
of concern in the three research mono- 
graphs by Hathaway and Monachesi, 
Craig, and Griffiths. The noteworthy 
thing about these three very different 
publications is that they represent 
the vigor with which sound empirical 
work is being pursued in a profes- 
sional field. All three, however, have 
only remote connections with com- 
prehensive theory, which may reflect 
again the lack of articulation between 
the scientific and the systematic tra- 
dition and the professional one in 
psychology generally. Their diver- 
sity is also worth noting as indicative 
of the number of problems accepted 
as the investigative responsibility of 
counseling psychoiogists and those 
concerned with school guidance. 

The Hathaway and Monachesi 
monograph is an exceptional piece of 
work for two reasons quite apart 
from the high level of research sophis- 
tication that marks it. — First, con- 
cerned with the prediction of delin- 
quent behavior in children, it avoids 
the pitfalls of retrospective data. 
Instead of studying the attributes of 
those already guilty of delinquent 
acts as opposed to those of youngsters 
who are not delinquent, Hathaway 
and his co-workers began with the 
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examination of large numbers of 
subjects below the age at which 
delinquency tends to occur. Predic- 
tions were made at that point, and 
their accuracy was determined by a 
close follow-up of the children. 

Second, this little book is a splen- 
did exhibition of what can be done 
with structured personality tests. 
Using the MMPI, these researchers 
concerned themselves with profile 
patterns rather than individual 
scores, thus introducing a regard for 
the complexity of test responses simi- 
lar to that insisted upon by Ror- 
schach workers and the users of other 
projective techniques. The method of 
coding profile patterns, however, 
does not sacrifice the rigor and exact- 
ness associated with quantitative 
scores but usually purchased at the 
cost of a good deal of information 
about the respondents. In short, 
the methodological implications of 
this study are that objective person- 
ality scales can be used in such a 
way as to retain precision without 
exorbitant cost in what Cronbach 
(2) has called “bandwidth.” In a 
field where assessment procedures 
and predictive devices often yield 
blurred and fuzzy results, the model 
provided here assumes particular 
importance. 

As for findings, Hathaway and 
Monachesi present convincing data 
to indicate that adult test patterns of 
the amoral psychopath and the hypo- 
manic individual tend to occur with 
predictively significant frequency 
among predelinquents, whereas the 
occurrence of test patterns character- 
istic of adult neurosis seem to exer- 
cise an inhibiting influence on the 
development of delinquent behavior. 
Similarly, those test patterns which 
show no high deviations and are 
generally considered as ‘‘normal”’ are 
remarkably predictive of nondelin- 
quent adjustments. This study 
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strongly suggests that the interpre- 
tation of delinquency as a form of 
neurotic behavior overlooks funda- 
mental differences in neurotic and 
criminal adjustments and implies that 
the dynamics of delinquency are not 
to be found in repressions and self- 
derogating tendencies. 

On the other hand, there is little 
in this monograph to shed much light 
on the antecedents of delinquency or 
its modification. Meaningful investi- 
gations of the experiential determi- 
nants of delinquent behavior, of the 
relevant learning environments in 
which delinquent patterns are ac- 
quired, are yet to come. Until such 
inquiries are undertaken with the 
same vigor, sophistication, and in- 
sistence on exactness that one finds 
in this prognostic study, it is unlikely 
that a useful theory of delinquent 
adjustment will be developed to re- 
place current clichés, hunches, or 
scattered bits of knowledge. 

Griffiths’ study of children's be- 
havior difficulties is not concerned 
with delinquency. Rather, he was 
concerned with the kinds of behavior 
problems that children themselves 
think they have. Studying young- 
sters from six to fourteen, he finds 
that awareness and acceptance of 
social rules and regulations increase 
with age, that younger children tend 
to be aware of overt aggression as a 
source bf trouble but only later to 
develop a sense of uneasiness about 
submissive and withdrawing tend- 
encies in themselves, that children 
generally are surprisingly aware of 
how both parents and teachers would 
like them to change their behavior, 
and that children from the middle 
socioeconomic group are more con- 
forming to adult norms than those 
from other classes. 

The most outstanding thing about 
this study is its documentation of the 
effects of socialization. As age— 
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which means social experience—in- 
creases, children become increasingly 
sensitive to adult demands and norms 
of conduct and even at relatively 
early levels show surprising insight 
into adult desires to have them 
change their behavior in certain 
identifiable ways. Moreover, children 
seem to discriminate fairly early the 
different demands made of them by 
different adults. Fathers, for exam- 
ple, are perceived as persons who 
primarily ask that they not be dis- 
turbed. Teachers are people pre- 
dominantly interested in an orderly 
and efficient classroom, although 
youngsters seem a bit more aware of 
the interest of teachers (in contrast 
to parents) in the modification of 
withdrawing and submissive traits. 
Mothers, like teachers, want smooth- 
running homes and evaluate chil- 
dren's behavior accordingly. 
Methodologically simple, this mon- 
ograph provides a wealth of informa- 
tion about children’s evaluations of 
their own behavior and relevant to 
an increased understanding of the 
socialization process. Again, how- 
ever, the findings are not related to 
any generative kind of theory that 
would lead readily to more syste- 
matic research on the socialization 
process or the antecedents of chil- 
dren's attitudes. The importance 
of this problem also requires com- 
prehensive study to deal with such 
questions as the influence of per- 
ceived adult norms on peer relation- 
ships, the development of conscience, 
and the development of personality. 
For example, does the child who be- 
comes aware of adult conduct norms 
at an early age tend to develop a 
greater degree of responsiveness to 
authority than the youngster who 
acquires this kind of awareness later? 
To what extent is this kind of aware- 
ness related to the type of rewards 
and punishments typical of the home 
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and to the pattern of child-parent 
relationships generally? Griffiths’ 
study provides an excellent point of 
departure for theoretically important 
investigations of the antecedents of 
all kinds of adjustment problems, in- 
cluding that of ‘‘adequate”’ or integra- 
tive adjustment patterns. 

Craig’s monograph on guided 
learning is in the tradition of trans- 
fer-of-training experiments of the 
Thorndikian type, but its implica- 
tions are somewhat broader than 
most such studies. Working with 
recent college graduates as subjects 
and using a task consisting of multi- 
ple-choice verbal test items, Craig 
found that giving learners a short 
statement of principles common to a 
group of items reduced errors, in- 
creased the efficiency of solutions, 
and aided the process of discovering 
the basis for correct response to other 
items. This effect became more pro- 
nounced as a function of the difficulty 
of the learning situations. While 
conceived within narrow and well- 
controlled laboratory conditions, the 
experiment suggests that interperson- 
al as well as intellectual problem solu- 
tion might well be facilitated by the 
development of principles common to 
many situations through participa- 
tion in discussion groups, orientation 
classes, and individual guidance con- 
ferences. Growing out of the tradi- 
tion of law-of-effect learning studies, 
Craig's inquiry has implications that 
amount to ready-made hypotheses 
for counseling psychologists to put to 
test. This kind of research would 
have the considerable advantage of 
being tied to a comprehensive theory 
of behavior that could be expanded 
in two directions: application in a 
knowledgeable kind of way to signif- 
icant social problems and theoretical 
extension to affective and conative 
spheres. 
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COUNSELING TECHNIQUE 


The great bulk of these books is 
devoted to technique, the methods to 
be employed by the practical coun- 
selor. This state of affairs is perhaps 
a function of many factors. One is 
the historical emphasis on practicality 
and tangible action in American 
culture, one current manifestation 
of which is the spate of how-to-do-it 
books in many areas. Another is the 
intensity of the demand out of which 
the professionalization of psychology 
has grown. With a greater awareness 
and acknowledgment of human prob- 
lems, there is a greater insistence 
that preventive and remedial treat- 
ment be supplied by somebody. One 
outcome seems to be a crop of man- 
uals and books of instructions for 
those who are pulled into this vacuum 
of need, often without the funda- 
mental training upon which profes- 
sional practice must rest. 

The striking thing about the pres- 
ent collection of nine volumes is their 
similarity and degree of overlap in 
content. The books by Arbuckle, 
Humphreys and Traxler, Knapp, 
Little and Chapman, Recktenwald, 
and Warters all say essentially the 
same things in essentially the same 
ways. One marvels at a market that 
can absorb such comparable publi- 
cations. It is not that these books 
are lacking in workmanship or com- 
petence; they have both qualities. 
But they are all very much in the 
same vein of describing activities 
by which a counselor can keep him- 
self busy doing technical things. 

Of these technical things, two are 
given major predominance. One is 
testing, the other is record keeping. 
Both, of course, are important, but 
one wonders if lists of psychometric 
instruments and lengthy discussions 
of cumulative records are the core 
of the counseling process. Counselors 
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certainly need methods of sharpening 
the precision of their observations 
and ways of estimating how a given 
client or student compares with his 
fellows. Tests are very relevant. 
Likewise, counselors must know how 
to keep account of the development 
and progress of those to whom they 
are responsible. Record-keeping de- 
vices are very much in point. The 
present manuals, however, devote 
very little space to the ways in which 
these technical aids can be used to 
advance the aim of the counseling 
process, the helpful modification of 
client behavior. Neither is there much 
space afforded to the use of the data 
collected by such instruments in the 
advancement of knowledge. If coun- 
seling psychology is to remain a part 
of psychology and not to degenerate 
into mere technician status, then 
surely research and careful thought 
about human problems as they are 
encountered on campuses, in indus- 
try, and in the consulting room gener- 
ally must be willingly assumed func- 
tions of practitioners of the coun- 
selor’s craft. 

If this point can be regarded as 
debatable, however, another can 
hardly be. The essence of the coun- 
selor’s job is the modification of 
client behavior through face-to-face 
contacts. It is counseling. The pro- 
portion of pages in six fat volumes 
concerned directly with this face- 
to-face process of behavior modifica- 
tion is of the order of eight per cent! 
One suspects that under professional 
pressure it is somehow easier to be 
occupied with testing programs and 
the maintenance of records than with 
direct personal service. One 
suspects that the lack of real knowl- 
edge about the counseling process is 
an overwhelming if inarticulate deter- 
minant of this strange fact. Is it 
possible that the how-to-do-it man- 


also 
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uals are premature in many ways 
and that more research and more 
thought on the counseling process 
and its outcomes are prerequisites 
yet to be achieved for preparing 
truly meaningful books on_ tech- 
nique? The profession is young, 
and its undergirding of basic knowl- 
edge may be too weak still to sup- 
port such a structure of textbooks. 
The same comments apply to Arse- 
nian’s little book, a set of instructions 
for YMCA secretaries who find them- 
selves involved in counseling rela- 
tionships. 

Two books are exceptions. One is 
Driver's discussion of the technique 
of multiple counseling, the other 
Tyler’s sophisticated and informed 
The Work of the Counselor. The 


first is an extensive and interesting 
presentation of small-group discus- 
sion methods of modifying behavior 
drawn primarily from a fusion of 
client-centered psychotherapy 


and 
group dynamics. The book is note- 
worthy because of its extensive case 
excerpts, its serious if rather loose 
attempts to evaluate procedures and 
outcomes, and its step-by-step de- 
scriptions of multiple counseling 
groups and their development. Any- 
one interested in group counseling 
methods will find rewarding reading 
here, and the challenge to rationalize 
the method and to subject it to rigor- 
ous investigation is considerable. Re- 
search hypotheses abound in its 
pages, and the book's freedom from 
a doctrinaire tone invites the test of 
empirical inquiry. 

Tyler’s volume, by far the best of 
the ‘‘practical’’ books on the present 
list, frankly acknowledges the status 
of counseling as an “‘art,’’ dependent 
more on the accumulation of shared 
clinical experience than on research 
evidence and tightly constructed 
theory. Nevertheless, she argues that 
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the rapidity of improvement in coun- 
seling procedures must rely on the 
soundness of relevant investigation 
and that intuition can be made ar- 
ticulate through research. Her chap- 
ters, therefore, are organized in terms 
of what counselors appear to do and 
think in their practice but are supple- 
mented with research summaries re- 
viewing the relevant research litera- 
ture and clarifying issues on which 
research may throw some light. Un- 
like other authors, Tyler’s greatest 
emphasis is on the interview, and her 
understanding of the relationship be- 
tween observation on the one hand 
and behavioral modification on the 
other is insightful and seminal while 
appropriately humble. 

Oddly, this fine book pays virtually 
no attention to personality dynamics 
or the problems of clarifying the con- 
cept of adjustment. This is the more 
remarkable since Tyler's is the only 
one of these volumes that makes it 
explicitly clear that the profession of 
counseling cannot be successfully fol- 
lowed by one whose only equipment 
is a bag of tricks, a knowledge of 
available tests, and ways of keeping 
records. Much of what she says im- 
plies that counselors require more 
than anything else an ever-develop- 
ing knowledge of learning, percep- 
tion, motivation, and human devel- 
opment, and the failure of her book 
to concern itself with these substan- 
tive issues in some degree is surprising 
and disappointing. There is a limit, 
however, to which an author can be 
taken to task for not writing a differ- 
ent book, and Tyler's contribution is 
too informed and too well based on 
general psychological knowledge and 
in the traditions of psychological 
science to permit much legitimate 
fault finding. 
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THEORETICAL CONTRIBUTIONS 


As a corrective to the how-to-do-it . 
manual, professional psychology has 
two major theoretical alternatives. 
It can attempt to systematize its ob- 
servations and to develop a formal 
body of propositions independently 
of the rest of psychological science, 
risking a fragmentation and divisive 
force that might well deprive the pro- 
fession of the intellectual and scien- 
tific bases on which most mature pro- 
fessions rest. Or, it can formalize its 
observations in terms of general psy- 
chological theory, a process which 
demands the assumption that such 
processes as learning, motivation, 
and perception are essentially similar 
whether studied in the laboratory or 
in the counselor’s office. 

The little book by Pepinsky and 
Pepinsky is an outstanding example 
of the latter possibility. With even 
more cogency and creativeness than 
Dollard and Miller’s Personality and 
Psychotherapy (3), this work illus- 
trates the advantages of relating 
clinical experience to a formal and 


‘elegant theory derived from experi- 


ment and the theorist’s study. The 
Pepinskys both apply and enlarge a 
relatively rigorous Hullian model by 
using it to account for anxiety and 
unproductive behavior as these con- 
cepts are employed descriptively by 
counseling practitioners. With a 
minimum of concern for specific 
techniques, Counseling: Theory and 
Practice demonstrates how a coun- 
selor can better understand and com- 
municate with his clients if he is 
thoroughly in possession of a compre- 
hensive theory that is well fortified 
by careful research than if he at- 
tempts to rely on a bag of technical 
tricks or the unrelated and some- 
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times inconsistent assertions that oc- 
casionally pass for theory among 
those whose training is deficient in 
general psychology. 

Indeed, a major thesis of the book 
is that professional practitioners must 
work both as applied clinicians and as 
scientists, each role facilitating the 
other. Experience as a counselor pro- 
vides observations that allow the in- 
ductive formulation of general laws. 
Experience as a scientist permits the 
testing of these laws and the determi- 
nation of which are useful, which re- 
quire modification, and which must 
be scrapped as unproductive of either 
new insights into the counseling 
process or new hypotheses for re- 
search. Practice unleavened by 
theoretically oriented research re- 
sponsibility leads to a softness of 
principle and a blurring of theoretical 
precision that is likely to reduce 
counseling to a matter of mystique 
and uncommunicable _ intuitions. 
Theory-oriented research without 


familiarity with practical problems 
and clinical events risks a narrowing 
and a rarefying that make theory 


lose significance and, lose 
comprehensiveness. 

Sanderson's Basic Concepts in Vo- 
cational Guidance is a very different 
matter, more illustrative of the first 
alternative, the construction of the- 
ory from professional observations 
without regard to more general psy- 
chological propositions. It is possible 
to read this volume without being 
reminded that human beings learn 
and perceive and manipulate symbols 
in thought or social communication! 
And yet the book is a good one, in- 
fused with the attitudes of science if 
not its general psychological content 
and reflecting a careful and thought- 
ful evaluation of a wealth of litera- 
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ture and counseling experience. Con- 
trary to the how-to-do-it books, this 
one rests on the proposition that 
counselors rely too heavily on ‘‘diag- 
nosis’ (testing and record keeping), 
and too little on “the helping process 
itself’ (behavioral change through 
face-to-face contact). Sanderson 
shares with the Pepinskys, then, the 
belief that counseling practice is facil- 
itated more by a coherent body of 
ideas by which the clinician can un- 
derstand his clients and his proce- 
dures than bya clutch of mechanical 
methods. 

Further, his conception of the 
counseling role is a broad one, taking 
fully into account the motives and 
affects that revolve around the status 
aspects of occupational adjustment 
and the variety of noneconomic re- 
wards bound up with work. But this 
very breadth of conception leads to 
disappointment when the reader finds 
nothing about the general psychology 
of personality or the research evi- 
dence and tentative principles formu- 
lated from it that bear on personal 
development, the making of deci- 
sions, or the socially acquired drives 
that determine so much of human ac- 
tivity. The alternative of reasonably 
precise but discreet and unsystema- 
tized concepts drawn from counseling 
experience itself may help to clarify 
the counselor’s present work, just as 
Sanderson's stress on understanding 
the helping process as contrasted 
with reliance on technique may help 
to increase the actual amount of 
service given to clients. It is most un- 
likely, however, to advance the field, 
to stimulate a search for new knowl- 
edge of relevance to the process or its 
outcomes, or to increase the under- 
standing of those general human 
functions which are essentially the 
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subject of the counselor’s concern, 
just as they are the subject of interest 
of friends, experimenters, and vir- 
tually all others who find the study of 
man inevitable as well as fascinating. 

Neither of these two competent 
books deals with the problem that 
seems so central even while it is so 
thoroughly ignored. What consti- 
tutes effective adjustment or inte- 
grative behavior? Both Sanderson 
and the Pepinskys are essentially 
silent on the question, although the 
latter skirmish with it briefly in a 
provocative chapter on the assessment 
of counseling processes and outcomes. 
It seems somehow unlikely that fruit- 
ful outcome-research can be accom- 
plished until this basic element in 
establishing a criterion is thought 
through. 


SomE GENERAL COMMENTS 


If the present list of books is a fair 
sample of the present state of counsel- 
ing psychology, there is room for 


worry over the effects of rapid profes- 
sionalization on psychology as sci- 
ence. With the public demand for 
service high, there is apparently a 
considerable interest in publications 
dealing with techniques, less in books 


seriously occupied with knowledge 
and understanding. There seems to be 
a tendency for practice to divorce it- 
self from science that could be detri- 
mental to both. Should counselors 
become only technicians and general 
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psychologists only runners of rats 
and manipulators of brass instru- 
ments, both psychology and a needful 
public will suffer. The professionali- 
zation of large elements in psychology 
is welcome not only because it pro- 
vides an opportunity to serve trou- 
bled people but because it opens new 
vistas for psychological research and 
the understanding of behavior. The 
problems are large enough so that 
their solutions probably lie in the col- 
laboration of practitioner and rat 
man or tachistoscope operator. With- 
out such collaboration, productive 
solutions seem far distant. 

But there is also evidence to justify 
ending this review on other than a 
Cassandra note. The books by the 
Pepinskys and Shaw and Ort and the 
three research monographs all indi- 
cate an investigative vigor and a re- 
tained relationship with general psy- 
chology that holds much in the way 
of theoretical promise, just as Tyler’s 
volume and, to a lesser degree, San- 
derson’s suggest that careful thought, 
a proper appreciation of research, and 
a knowledge of available evidence 
from the laboratory and the field still 
have a stout influence on practice. 
Division is by no means inevitable, 
and perhaps only a little awareness of 
the issues is necessary to insure the 
development of a useful profession 
firmly rooted in psychological knowl- 
edge and rigorously developed the- 
ory. 
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BOOK REVIEWS 


CARMICHAEL, LEONARD. (Ed.) Man- 
ual of child psychology. (2nd Ed.) 
New York: Wiley, 1954. Pp. 
ix+1295. $12.00 


The second edition of the Manual 
presents an excellent picture of child 
psychology. In this respect it is a bet- 
ter book than the first edition, which 
in the reviewer's opinion was out of 
date when it was issued.’ This does 
not mean that this is a definitive 
work. It is, rather, an expansive, fac- 
tious book. But it would be surpris- 
ing if it were otherwise, considering 
the unfinished state of the frontier 
area with which it deals. At the same 
time it is an ambitious book that 
faithfully reflects the vigor and aspir- 
ations of this youthful science. The 
title of the volume, itself, expresses 
the ambitious tone of much of the 
book. A manual, by dictionary defi- 


nition and by common usage, is a 
book of reference which can be con- 


veniently carried in the hand. But 
this book’s four and one-half pounds 


make it too big to be easily handled, 


and it is a book to be studied and 
argued with, not one to be referred to 
on the run for authoritative answers 
to standard problems. 

The book is similar in organization 
and authorship to the first edition, 
seventeen of its nineteen chapters 
having the same titles and sixteen the 
same authors as the first edition (two 
chapters have different junior au- 
thors). Chapter changes from the 
first to the second editions are: ‘‘The 
Feebleminded Child” (Doll) is incor- 
porated into a more inclusive chapter 
called “Psychopathology of Child- 
hood”’ (Benda); ‘Maturation of Be- 


1 Roger G. Barker. Manual of child psychol- 
ogy, a special review. Psychol. Bull., 1947, 
44, 162-170. 


havior’’ (McGraw) is omitted ; ‘‘Social 
Development”’ (H. H. and G. L. An- 
derson) is added; ‘‘Adolescence”’ is 
written by Horrocks in the place of 
Dennis. A number of chapters in the 
new edition reprint large sections 
from the preyious one in unchanged 
form. To be exact, 70 per cent or 
more of the following chapters are re- 
printings of the first edition: ‘The 
Onset and Early Development of Be- 
havior’’ (Carmichael), “Animal In- 
fancy”’ (Cruikshank), ‘The Neo- 
nate’ (Pratt), ‘Physical Growth” 
(Thompson), “The Ontogenesis of 
Infant Behavior’ (Gesell), ‘‘Learning 
in Children’ (Munn), ‘Mental 
Growth” (Goodenough), ‘‘Research 
on Primitive Children’”’ (Mead), “‘ Be- 
havior and Development as a Func- 
tion of the Total Situation’’ (Lewin, 
with a supplement by Escalona), and 
“Gifted Children”’ (Cox). About half 
of this edition of the Manual is a re- 
printing of the first edition, and about 
half involves major revisions or new 
topics. 

The ten chapters listed above will 
not be considered in detail here. The 
reader is referred to the earlier review 
for comments on them, as their anal- 
yses of the topics under consideration 
are essentially unchanged. These are 
research areas which the authors see 
as having been on a plateau in recent 
years so far as new developments are 
concerned. This does not mean, of 
course, that important additions, rep- 
lications, and verifications have not 
occurred. For example, a keystone 
in the study of gifted children was 
put in place during this period by 
Terman and Oden’s twenty-five year 
report. 

The reviewer must repeat his ob- 
jection to the chapter on the physi- 
cal growth of children. He agrees 
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completely with the author that 
“child behavior cannot be understood 
apart from...the physical body 
through which it perceives, reacts, 
and functions, and to which others 
react,” (p. 292); but unfortunately 
he finds here no consideration of this 
problem. Can't we have a chapter on 
the psychology of physical differences 
in children in the next edition such as 
we have on the psycholdgy of sex dif- 
ferences in this one? 

The chapters of the Manual where 
major changes appear, and the new 
chapters, may be individually men- 
tioned as follows: 

“Methods of Child Psychology” 
(John Anderson). Anderson wrote 
chapters with this title in the 1933 
edition of Murchison’s Handbook and 
the 1946 edition of the Manual. 


These, with the present essay, con- 
stitute a revealing exhibit of the re- 
wards of two decades of preoccupa- 
tion by psychologists with the prob- 
lems of methodology. 


Anderson 
wrote approximately 11,000 words in 
1933, 22,000 in 1946, and 33,000 in 
1954 and while the increase in level 
may not be quite so marked as the 
increase in quantity, the reviewer 
perceives it as being nearly so. How- 
ever, the scope of the chapter is too 
wide; it is, in fact, a survey of general 
psychological methodology. With a 
few changes in nomenclature and ex- 
amples this essay would be as appro- 
priate in a social or an experimental 
psychology as it is here. One hopes 
that the time will come when the 
general field will have been so well 
covered by others that Anderson can 
devote his attention to special prob- 
lems of methodology in psychological 
research with children. 

“The Environment and Mental 
Development” (H. E. Jones). Jones’ 
review shows that there has been an 
active and profitable period of work 
on this area. He reports a welcome 
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movement away from global nature- 
nurture studies to increased specifica- 
tion of the effects of particular en- 
vironmental influences. Jones ap- 
pears to take the position, without 
elaborating and emphasizing it suf- 
ficiently, however, that this move- 
ment must go still further. He says 
in connection with the Iowa studies 
“our chief need in this field is not for 
statistical methods of greater power 
and subtility but for more vigorous 
experimental procedures... ’’ (681). 
If he means here, as the reviewer 
thinks he does, the need for better 
definition and control of environ- 
mental variables, he has raised a 
crucial point. In its context, Jones’ 
statement seems to imply that, for 
example, it is not enough to study the 
effects of nursery school attendance 
upon intellectual development; it is 
necessary to define the significance of 
nursery school for children in psycho- 
logically meaningful concepts: perhaps 
in such variables as the stimulation, 
the attractiveness, the freedom, the 
rewards, and the learning opportuni- 
ties of the nursery school situation. 
This points to an important require- 
ment which Jones’ review does little 
to meet. Although his critical evalua- 
tions of the technical adequacy of in- 
dividual studies are excellent, he does 
little to organize and conceptualize 
the kinds of problems in this field. 
All “environmental” influences, rang- 
ing from culture, economics, social 
class, foster homes, hospitalization, 
and schooling to birth order, season of 
the year, nutrition, health, physical 
size, physiological maturity, and race 
are treated on the same level. Surely 
the time will soon come when some 
discrimination and sorting of prob- 
lems within this field will be possible 
in terms of primitive theories of en- 
vironmental variables. 

Character‘ Development in Chil- 
dren—An Objective Approach (Ver- 
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non Jones). The author reports many 
new investigations of relevance to 
this topic, but the area of understand- 
ing appears to have been increased 
very little in the time since the first 
edition. It is clear that this is an area 
where the multiplying of facts is 
proving of little benefit. New direc- 
tions are needed. 

“Language Development in Chil- 
dren” (McCarthy). This is the long- 
est chapter in the Manual (120 pages) 
with the largest number of references 
(776). It has been reorganized and 
greatly expanded for this edition. 
Extensive new sections cover the 
vocalization of infants and language 
disorders in relation to personality 
development. The problems con- 
sidered cover a tremendous range in- 
cluding, for example, fetal sounds and 
the birth cry; vowel and consonant 
sounds in infant vocalization: vocab- 
ulary tests; amount and rate of talk- 
ing; the function of language in the 
child’s life; the effects of institution- 


alization and multiple birth upon 
language development; the correla- 
tion between language and motor, 
intellectual, and social development; 


language disorder syndromes and 
personality development; delayed 
speech; articulatory defects, stutter- 
ing. Despite McCarthy’s very effec- 
tive organization and condensation 
(including two four-page summariz- 
ing tables), the amount of material is 
so overwhelming that one leaves this 
chapter with a great weariness. Per- 
haps the basic difficulty is that 
“spoken language”’ is in itself not a 
useful category of behavior; perhaps 
it is only phenotypic, symptomatic 
behavior like hand behavior and foot 
behavior. In any case, to make 
future reviews more effective, some 
means of focusing upon the psycho- 
logically most meaningful and timely 
issues is essential. As it stands, this 
chapter provides a useful guide to a 
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vast literature. 

“Psychological Sex Differences’’ 
(Terman and Tyler). This chapter 
has been largely rewritten for this edi- 
tion of the Manual, and it might well 
serve as a model for the kind of ex- 
position suited to this book: the prob- 
lem of the chapter is clear, the evi- 
dence is not beyond the limits of 
what can be handled in the space 
available, the authors make explicit 
the limits of their review and of the 
bibliographical sources covered, they 
include only those publications most 
relevant to the trend of the evidence 
as they see it. This method reveals 
clearly the status of the problems as 
viewed by the authors. Develop- 
ments in the interval since the first 
edition of the Manual are more ade- 
quate data which confirm earlier 
findings and which reinforce the au- 
thors’ earlier caution regarding possi- 
ble biological and cultural roots of 
psychological sex differences. The- 
ories and speculations about the pos- 
sible lines of influence of these fac- 
tors are not covered. This is a feature 
of the problem which some might 
consider of value. 

“Emotional Development” (Jer- 
sild). This is a completely reorgan- 
ized and greatly expanded survey of 
emotionality in children. Jersild 
places his discussion within an ex- 
plicitly stated conception of emotion. 
He considers not only research find- 
ings, but various theories and specu- 
lations as well. He covers not only 
the well tilled fields, but also such 
newly cleared research areas as joy 
and compassion. It is beyond the 
scope of the present review to evalu- 
ate this chapter in detail. While it 
can doubtless be criticized from many 
viewpoints, it is effective as a struc- 
ture for presenting much of the cur- 
rent state of knowledge and thinking 
in this area, much more so than the 
chapter in the previous edition. 
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“The Adolescent’’ (Horrocks). It 
is difficult to conceive of two chapters 
with this title more different than 
those of Dennis in 1946 and Hor- 
rocks in 1954. Some measure of this 
difference is found in the fact that 
while between them they refer to 337 
citations of relevant literature dated 
1941 or earlier, only 12 per cent of 
these citations are common to both 
bibliographies. This must carry an 
important lesson regarding the pre- 
ciseness with which adolescent is de- 
fined and the “‘reliability’’ of these 
two ‘observers’ of the relevant 
literature. While Dennis defined his 
problem as that of the behavioral 
correlates of physiological sexual 
maturation in humans, Horrocks 
ranges much more widely. He de- 
votes six pages to the history of stud- 
ies of adolescence, thirteen pages to 
the physical and physiological aspects 
of adolescence, and nine pages to be- 
havior in adolescence. This is a 
rather elementary, didactic essay on 
many aspects of adolescent problems. 

“Psychopathology of Childhood” 
(Benda). This is a new chapter in the 
Manual; it covers mental deficiency 
(30 pages), personality disorders on a 
higher integral level (eight pages), 
and childhood schizophrenia (five 
pages); it has ninety-four citations in 
the bibliography. It is beyond the 
reviewer's competence to make eval- 
uative comments on this chapter. 

“Social Development” (H. H. and 
G. L. Anderson). Harold Anderson 
has, over a number of years, produced 
an important series of investigations 
of the social behavior of children. 
Here for the first time is a vigorous 
and invigorating statement of the 
theoretical framework of this re- 
search. It is a theory of almost cos- 
mic scope. A sample of the headings 
used by the Andersons includes: The 
Characteristics of Biological Growth; 
Growth of Civilizations; The Place of 
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Value; Social Development and the 
Second Law of Thermodynamics; 
Psychological Entropy and Culture; 
Theory of Probability; ‘‘Scientific’”’ 
Prediction and Organization; Disinte- 
gration, Deterioration, and Psychosis; 
The Growth Circle; The Vicious 
Circle; Friendliness, Cooperation, 
and Integrative Behavior; Leader- 
ship; Social Learning. The bibliogra- 
phy refers to such diverse scholars as 
Toynbee, Wiener, Sinnott, Conant, 
Freud, Spranger, Krech, Dennis, 
Sears, and Schneirla. The chapter 
begins with an outline of trends in 
psychological thinking rooted in Dar- 
win, Preyer, and William James and 
ends with the recent investigations of 
Lippitt, Bowlby, and Bronfenbren- 
ner. A theory of such scope is beyond 
the capacity of this reviewer to evalu- 
ate, and the task is not made easier 
by the fact that forty-five pages are 
too few to cover all the details and 
make all the transitions. However, 
this reader found intriguing the world 
view so badly sketched. Here is the 
kind of broad perspective child psy- 
chology hopes some day to achieve, 
and this may be the direction the 
quest will take. It is to be hoped that 
this essay can be elaborated else- 
where as it deserves. 

The present volume exhibits the 
same ambiguities as the earlier one 
regarding its functions. Some chap- 
ters are written as for a text; others 
are scholarly contributions with no 
concessions to immature students; 
still others are literature reviews 
similar to those of the Psychological 
Bulletin. Partly as a consequence of 
these different conceptions the chap- 
ters are uneven in style and level of 
writing. But this variety is also, 
without doubt, a product of the un- 
even development of the different 
areas. Child psychology is, indeed, 
in a ragged, unfinished state and the 
Manual reflects this in all its aspects. 
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One instance of this is found in the 
way the same material is handled in 
chapters with overlapping subject 
matter. When the same iswue is con- 
sidered by more than one author, it 
is not uncommon to find different 
interpretations of the data. Another 
instance is found in the treatment ac- 
corded two modern giants of child 
psychology: Freud and Piaget. Sev- 
eral authors make a valiant effort to 
include psychoanalytic findings. It is 
obvious, however, that all are un- 
comfortable in their efforts to be 
simultaneously tolerant and scien- 
tific. Psychoanalysis in this book is 
like a foreign body. (One cannot but 
remark that in this respect the second 
edition is far ahead of the first, where 
Freud received little mention, and 
that the Manual is far ahead of most 
books under a psychoanalytical aegis 
which give academic psychology even 
less attention.) Piaget, too, is treated 
as an alien, for reasons less obvious 
to the reviewer. There is no reference 
to Piaget’s important recent publica- 
tions. 

This is a good picture of child psy- 
chology in 1954. The weaknesses of 
the Manual are largely the weak- 
nesses of the science it surveys. A 
good manual of child psychology 
awaits a more mature science of 
child behavior. In the meantime this 
book and, it is to be hoped, its future 
editions provide an important aid in 
achieving this maturity. 

RoGER G. BARKER 

University of Kansas 


THRALL, R. M., Coomps, C. H., & 
Davis, R. L. (Eds.) Decision proc- 
esses. New York: John Wiley, 
1954. Pp. viii+332. $5.00. 


A summer seminar in 1952 brought 


together economists, mathemati- 
cians, psychologists, and a few repre- 
sentatives of other fields, all con- 
cerned with some aspect of choice 
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behavior. Each participant commun- 
icated the ideas in his field which he 
regarded as likely to benefit the 
others, and some of them were stimu- 
lated by the interchange to start new 
research. The 19 heterogeneous 
papers comprising Decision Processes 
were prepared during the subsequent 
year. The book makes evident the 
difficulties in this area and discour- 
ages any expectation of early pay-off 
for the psychologist. 

We may contrast two broad lines of 
interest in decision analysis, the 
normative and the descriptive. The 
normative, found in mathematics, 
statistics, and economics, tries to 
state how a rational being ought to 
act. The descriptive is found in 
psychology, and to an increasing ex- 
tent in economics. Descriptive stud- 
ies examine actual choices, and seek a 
law to predict the choices or a ration- 
alization to “explain” them. Models 
from the normative studies aid in 
rationalization, and the descriptions 
demonstrate, among other things, 
how far normal behavior is from that 
of the postulated rational being. 

Among the normative papers here, 
one series compares the various de- 
cision criteria (e.g., minimax) which 
are currently under study. A second 
series deals with the development, 
from axioms, of utility functions 
to describe individual preferences. 
These papers contain some new 
mathematical developments. Of 
most general interest is Marschak’s 
‘‘Towards an Economic Theory of Or- 
ganization and Information’ which 
deals comprehensively with the value 
of information in making decisions. 
This paper is not as useful an intro- 
duction, however, as Marschak’s 
chapter in Mathematical Thinking 
in the Behavioral Sciences (Glencoe: 
Free Press, 1954, P. F. Lazarsfeld, 
ed.). 

On the descriptive side, we have a 
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variety of experiments on preferences 
among wagers, on probability learn- 
ing, and on coalition-forming be- 
havior. The prize paper in this group 
is a delightfully designed experiment 
by Hoffman, Lawrence, and Fes- 
tinger, demonstrating that coalition 
forming depends not only on the 
ostensible pay-offs to be obtained, 
but also on subtle social attitudes 
among the participants. ‘ Bush, Mos- 
teller, and Thompson present a com- 
plicated stochastic model for learning 
in choice situations. Coombs, Raiffa, 
and Thrall write helpfully on the 
place of mathematical models in 
scientific reasoning, and Coombs em- 
ploys his ‘‘ordered metric’’ in papers 
on social choice and (with Beardslee) 
on wagering. 

This volume seems more a memo- 
randum among the participants than 
a presentation for other readers. 
These reports do not represent sub- 
stantial, consolidated advances, as all 
concerned recognize. There is much 
use of the single-case experiment, or 
of the tentative mathematical for- 
mulation involving assumptions un- 
satisfactory to the author. The pa- 
pers are difficult to read, as mathe- 
matics and as English. Even Estes, 
one of the more lucid contributors, 
writes this appalling sentence: 
“When the model is interpreted in 
terms of the present experiment, it 
turns out that the rate of learning 
(systematic change in probability of 
making a given prediction) depends 
upon the characteristics of the mo- 
mentary environmental situation but 
that over a considerable series of 
trials the probability of making a 
given prediction tends to a stable 
asymptotic distribution with the 
asymptotic mean, for a group of simi- 
lar individuals run under like condi- 
tions, being independent of the mo- 
mentary environmental situation, in 
the present experiment, the nature of 
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the signal, S.”’ 

Many psychologists are likely to 
learn something from a few chapters 
which are closely related to their in- 
terests, and to which they can bring 
considerable background. For the 
reader wishing a general knowledge of 
the frontier areas represented here, 
one would recommend parts of the 
Lazarsfeld symposium, the recent 
Bulletin paper by Ward Edwards, 
and Irwin Bross’ Design for Decision 
(Macmillan, 1954). 

The eventual impact of utility 
theory on psychology may well be 
considerable, even though investiga- 
tors are not yet successful in inte- 
grating the field. The concept of 
preference for various outcome dis- 
tributions intrudes into the interpre- 
tation of learning experiments, where 
psychologists have generally regarded 
each reinforcement as having an ob- 
jective value. Utility theory shows 
that concepts like level of aspiration 
are oversimplified, and the coalition 
experiments introduce new and pro- 
vocative problems for social psy- 
chologists. The contrast between 
actual choice and rational choice has 
many implications for the psychology 
of thinking, especially as regards its 
social determinants. Talented in- 
vestigators have started to mine in 
what is evidently a good spot, even if 
this first load contains little pay dirt. 

LEE J. CRONBACH 

University of Illinois 


Mosss, PAut J. The voice of neurosis. 
New York: Grune & Stratton, 
1954. Pp. v+131. $4.00. 


In this volume, the author focuses 
on attributes of voice, not on verbal 
communication or speech in the usual 
sense. Resonance, melody, register 
illustrate the elements emphasized. 
Not sharply organized or sufficiently 
substantiated to serve as a firm intro- 
duction to the method or as a con- 
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vincing summary of research, the 
book nevertheless opens up intriguing 
potentialities. Voice patterns reflect 
peculiarly significant aspects of per- 
sonality; the neurotic becomes a dif- 
ferent voice instrument from the 
schizophrenic; and therapy combin- 
ing voice and dynamic elements is 
indeed a fascinating approach. These 
are the three themes that run through 
the book. 

The major thesis presents voice as 
an expressive technique. Just as a 
graphologist uses handwriting, so 
Moses uses voice in all its varied 
aspects to understand personality. 
He compares a blind analysis with a 
Rorschach, and integrates other analy- 
ses with case material. His ideas 
about sound approaches to interpre- 
tation hold interest for those who use 
projective or expressive methods. 
The aspects of voice he emphasizes 
and his guide lines in analysis suggest 
that he may have hit on one of the 
potentially most fruitful areas of ex- 
pressive behavior—with elements 
subtly modified throughout impor- 
tant stages of life history and remain- 
ing identifiably fixed in adult vocal 
behavior. With ever increasing 
fidelity in recording, voice may be- 
come a major focus in the under- 
standing of personality. 

Roy M. HAMLIN 

Western Psychiatric Institute 

University of Pittsburgh 


NuTTIN, JosepuH. Tdche réussite et 
échec: théorie de la conduite hu- 
maine. Amsterdam: Publications 
Universitaires de Louvain, 1953. 
Pp. x+530. 330 fr. 


Over a period of about fifteen 
years, the author of this study has 
been conducting experiments at the 
University of Louvain that are in- 
tended to show the impact of success 
and failure upon human personality. 
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They comprise rather simple experi- 
ments in perception, learning, and 
memory. In some experiments, sub- 
jects are asked to judge the area of 
geometrical surfaces, or the number 
of people shown in pictures. In 
others, they are asked to recall 
whether shown two places numbers 
were added to or subtracted from 
another such number in a previous 
showing, or whether words were 
learned as associated with another 
word or a number. Arranged in 
series, the subject is required to ren- 
der his judgment as each member in 
the series is presented, and then is 
told whether his response is right or 
wrong. In some series the number of 
successful and unsuccessful judg- 
ments are equal, in others one or the 
other variety predominates. Subse- 
quently, the subject is questioned re- 
garding matters for which the in- 
structions provided no mental set. 
For the most part, subjects are young 
people of school and college age. The 
results of normals are compared with 
those of manic and melancholic sub- 
jects. The experiments are so de- 
signed as to be safeguarded against 
the usual types of psychological 
error. 

Experiments on learning in which 
some terms are remembered and 
others forgotten inevitably involve 
the question as to how successful re- 
sponses tend to be reinforced. In 
this context, Nuttin undertakes a 
critical examination of the meaning of 
the law of effect. He concludes that 
Thorndike’s view, at least as set forth 
in his later writings, is too mecha- 
nistic. Hull's position is objected to on 
similar grounds. With respect to the 
views of Tolman, he is more tolerant. 
Nuttin expounds a dynamic concep- 
tion which admits of a cognitive fac- 
tor in the stimulus-response sequence, 
and stresses perfection of response 
rather than repetition of behavior in 
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the learning situation. Reduction of 
need is designated the essential con- 
dition for reinforcement. He finds 
support for this conclusion in the re- 
cent experiments concerning unfin- 
ished tasks. 

To this reviewer it seems that, 
having performed numerous experi- 
ments on the effect of success and fail- 
ure upon incidental memory in simple 
laboratory tasks, Nuttin has under- 
taken to widen the scope of the bear- 
ing of the experiments in such a way 
as to provide a general discussion of 
personality dynamics. In an intro- 
ductory section under the title ‘‘Con- 
duct and Result,”’ he gives a critical 
survey of the chief contributors to 
experimental and clinical theory in 
psychology. Here he develops his 
theory of personality dynamics in 
which the cumulative effects of suc- 
cess and failure are often so devastat- 
ing as to leave traumatic effects. 

While those who subscribe to strict 
objectivity in psychology will be apt 
to deem Nuttin’s appeal to a cogni- 
tive factor in behavior dynamics as a 
regression, those who are disposed to 
stress the bearing of psychology upon 
everyday life will find much to ac- 
claim. Left as an open question, how- 
ever, is the degree to which simple 
laboratory tasks, such as those in 
which a subject estimates the number 
of square centimeters in a triangle or 
the number of people in a picture and 
is told he is wrong, approximate such 
real-life failures as loss of academic 
standing, rejection by an attractive 
person of the opposite sex, or elimina- 
tion from the team in sports. 

Using a simple style that is free of 
the more involved forms of idiomatic 
expression, Nuttin easily conveys not 
only his ideas but also his enthusiasm 
for experiment. By integrating both 
experimental and clinical sources, the 
work is given a broad practical bear- 
ing which will be of interest primarily 
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to specialists in learning and in per- 
sonality development 
MICHAEL J. ZIGLER 
Wellesley College 


KORNHAUSER, ARTHUR, DUBIN, Ros- 
ERT, & Ross, ArtHuR M. (Eds.) 
Industrial conflict. New York: 
McGraw-Hill, 1954. Pp. vii+551. 
$6.00. 


The subject of industrial conflict is 
so broad that one would hope for its 
treatment to be lengthy, interdisci- 
plinary, and to encompass many spe- 
cific topics and viewpoints. As such, 
this book, written under the sponsor- 
ship of SPSSI, meets all one’s expec- 
tations. It sets out with the ambi- 
tious purposes of analyzing the deter- 
mining factors and conditions which 
give rise to industrial conflict and of 
assessing various efforts of solution. 

The book is divided into five main 
parts: (a) Basic issues concerning in- 
dustrial conflict. (6) Roots of in- 
dustrial conflict (motivational analy- 
sis, organization and leadership of 
groups in conflict, social and eco- 
nomic influences). (c) Dealing with 
industrial conflict (accommodating 
to conflict, efforts to remove sources 
of conflict, social control of industrial 
conflict). (d) Industrial conflict in 
other societies. (e) Industrial con- 
flict; present and future. 

Thirty-nine different authors, in- 
cluding academicians, writers, labor 
leaders, and industrial representa- 
tives, have contributed to the book’s 
forty chapters. With such a hetero- 
geneous group, it is inevitable that 
varying opinions will be found. How- 
ever, this adds to the book’s value by 
providing a variety of viewpoints and 
by highlighting some of the unsolved 
issues. It is exciting to see this inter- 
disciplinary approach being applied 
increasingly to industrial problems. 
Certainly such an approach must be 
satisfying to all ‘“problem-oriented” 
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industrial psychologists who have 
long recognized that their medicine 
bag does not always contain the 
optimum solution, that many of their 
tasks are not purely psychological in 
nature but transverse numerous dis- 
ciplines. 

That the book covers a timely topic 
is obvious—the tables showing man- 
days idle as a result of strikes are in 
themselves sufficient proof. That the 
book asks more questions than it 
gives answers, that it presents more 
suggestions for further research than 
conclusions from past research, is 
also true. But it also offers a compre- 
hensive picture for the ‘‘serious- 
minded public’’ who want a general 
overview of the relations between 
labor and management groups and 
among individuals in the industrial 
setting. 

A final word about the authors. In 
these days when an increasing num- 
ber of “edited’’ books turn out to be 


reprints of journal articles plus a few 
introductory remarks, it is refreshing 
to find a book where each author has 
not only contributed something new 


to the field but has done so with 
no monetary remuneration—all pro- 
ceeds from the book go to the SPSSI. 
JEROME H. Evy 
Dunlap and Associates, Inc. 


SCHAFER, Roy. Psychoanalytic in- 
terpretation in Rorschach testing. 
New York: Grune & Stratton, 
1954. Pp. xiv+446. $8.75. 


Skinner teaches an alert pigeon to 
peck a bulls-eye in five minutes, by 
first reinforcing any approximate suc- 
cess. Struggling with a much more 
challenging puzzle, the book reviewed 
here is no bulls-eye, but may be 
hailed with cautious enthusiasm as 
the most encouraging near miss of its 
kind yet published. 

The author presents a detailed at- 
tempt to establish feedback between 
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painstaking observation of complex 
individual behavior and_ general 
“laws."’ The fact that some of the 
laws are tentative or dubious need 
not be emphasized with undue dis- 
tress. The attempt itself hits at the 
core of the projective problem. The 
Rorschach was never a test in the 
Binet tradition, simplified by design 
to point up specifically what should 
be counted. The inkblots call forth 
behavior which retains a high degree 
of uniquely individual complexity. 
To look at such behavior and start 
counting (M, H, D), or testing fruit- 
less hypotheses, is easy. To tease out 
a pattern, theme or process that con- 
stitutes a meaningful unit is a prob- 
lem that has baffled both clinicians 
and statisticians. Those clinicians 
who seem to have the art have had 
little success in writing the method. 
Statisticians like R. B. Cattell dis- 
dain the ‘inventive’ response of the 
projective method, throw up their 
hands, and say: What we want is a 
traditional model test of dynamisms! 

Schafer’s approach involves chiefly 
three elements: (a) a vocabulary or 
classificatory system taken from the 
psychoanalytic terminology for ego 
defenses; (b) the use of judgment in 
teasing out units, with this judgment 
based on a background of empirical 
observation, experimental evidence, 
and thoughtful speculation; and (c) 
a rough check back of these broad 
units against other empirical evidence 
(descriptive case material). This still 
crude approach is not new, but 
Schafer’s formulated approximation 
represents a step forward in specific- 
ity and scope. 

The author’s bias or biases need 
not be approved with equal enthusi- 
asm. Specific biases that mar some 
chapters and some elements through- 
out the book can be mentioned only 
with an important reservation: as 
stated here they represent 90 per cent 
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the reviewer's projections and only 
10 per cent the author's attitudinal 
style. Bluntly, however, on some 
pages the author does seem to feel 
that (a) the well-adjusted (analyzed) 
psychologist in a medical setting 
should accept the healthy masochis- 
tic role of a second-class citizen; (6) 
an expert is more someone with wide 
and approved experience than some- 
one who should be asked to produce 
expert evidence; (c) psychoanalytic 
theory may not be the ultimate final 
word, but it is the current final word 
as far as party line handed down to 
second-rate citizens is concerned; and 
(d) usually anything the Rorschach 
reveals is best understood if labeled 
with a derogatory word (infantile, 
sadistic, compulsive). 

Actually the author struggles con- 
sistently against such biases: lauding 
solid evidence, rejecting Rorschach’s 
fascinating notes on the inkblots as 
the final word in this area, and recog- 
nizing that there is something “‘fun- 


damentally neurotic’ about reporting 
all observations in terms of deroga- 


tory value judgments. Yet his own 
problems of professional and scientific 
identity peek through. 

He nevertheless succeeds in setting 
forth the general outline of a process 
of clinical judgment, or “intuition,” 
that makes sense. Successful judg- 
ments may involve the cancelling out 
of many details based on false as- 
sumptions, self-deception, and ini- 
tially loose speculation. The relaxed 
acceptance of all elements, good and 
bad, checked then with critical rigor 
against general guide lines which 
Schafer calls theory, may be impor- 
tant. The general feel for such a proc- 
ess is conveyed by Schafer’s book. 
The further analysis of such judg- 
ment processes is important to psy- 
chology, as a field of study and as a 
research tool. The pattern of this ap- 
proach may lead to more fruitful 
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progress than a pattern based from 
the beginning on an unimaginative 
reading of the APA’s Technical Rec- 
ommendations for Psychological Tests. 
Roy M. HAMLIN 
Western Psychiatric Institute, 
University of Pittsburgh 


TayLor, W. S. Dynamic and ab- 
normal psychology. New York: 
American Book Co., 1954. Pp. 
xiv+658. $5.50. 


According to a statement in the 
author’s preface, this book was de- 
signed primarily as a textbook for 
courses in abnormal psychology. It 
is my impression, however, that it is 
unlikely to win a wide acceptance. 
My reasons for this judgment follow. 

It would appear that this book has 
grown from Professor Taylor’s own 
course in the subject and no doubt 
will fit it admirably. However, his 
course seems rather unique, at least 
so far as my knowledge of other 
courses and texts in abnormal psy- 
chology goes. It is rare that learning, 
action, thought, and connector proc- 
esses are discussed in a course in ab- 
normal psychology and treated there 
much as they would be in general 
psychology. Yet five of the 19 chap- 
ters deal with these topics, and per- 
haps half the book is devoted to 
matters not now ordinarily found in 
currently widely used texts. As a 
matter of fact, only one chapter is 
devoted to the major behavioral dis- 
orders—neuroses, psychoses, and cer- 
tain other categories. 

More importantly, the several 
chapters do not seem to hang to- 
gether in a compellingly coherent 
way. Having discussed action, for 
example, the author seems not to 
make use of this discussion in later 
treatments of other topics. More- 
over, many of the subjects introduced 
receive so scanty a discussion as to be 
unintelligible to the naive reader and 
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simply uninformative to the moder- 
ately sophisticated reader. Illustra- 
tive, though not representative, is 
the section entitled, ‘Psychological 
Aids to Diagnosis and Prognosis.” 
In two and a fraction pages of actual 
text, ten topics are discussed, ranging 
from word association tests and reac- 
tion times to projective techniques. 

Professor Taylor has obviously 
read widely, and the book contains 
numerous quotations from literature 
and anecdotes of one sort or another 
drawn from a wide variety of sources. 
But there is abundance here, to the 
point often of excessive redundancy, 
and actual confusion as well as flag- 
ging interest is apt to be the result. 

Although I feel certain that Pro- 
fessor Taylor has one, the book does 
not convey to me a theory or a co- 
herent picture of the dynamics of ad- 
justment or of abnormality. It ap- 
pears to be eclectic, but amorphously 
so. 

It remains to be said that there are 
many things of value in this volume. 
The abundant case materials, the 
many discussions of problems and 
questions, such, for example, as mul- 
tiple personality, suggestion, free 
will, and other matters no longer fre- 
quently encountered in our texts, the 
history of abnormal psychology, and 
the over-all attempt to employ the 
categories of general psychology in 
the special field of dynamics and 
abnormality are all points of interest 
and worthy of commendation. Asa 
textbook for general use, however, | 
do not feel that it can be recom- 
mended. 

CHARLES N. COFER 

University of Maryland 


THorPE, Louis P., & SCHMULLER, 
ALLEN M. Contemporary theories 
of learning. New York: Ronald 
Press Co., 1953. Pp. viii+480. 
$5.50. 
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The authors indicate that ‘‘It is the 
purpose of this textbook for univer- 
sity and college students to explain 
the most important theories of learn- 
ing in the clearest and simplest possi- 
ble language, to show the relevance 
of each of them to the educational 
process, and to point out that in 
spite of the many conflicts between 
these theories they have a common 
ground upon which can be based an 
intelligible pattern of classroom pro- 
cedure” (p. v.) Each aspect of this 
purpose is in itself a large and im- 
portant undertaking that makes quite 
different demands upon the authors 
and requires different evaluation cri- 
teria. Two responsibilities are as- 


sumed in accomplishing the first part 
of their purpose: (a) to indicate their 
criteria for selecting material from a 
theory, and (b) to explain each theory 
accurately. As the chapters are writ- 
ten the student is likely to infer that 
the volume presents unabridged theo- 


ries of seven men (Thorndike, Guth- 
rie, Hull, Skinner, Wheeler, Tolman 
and Dewey) and two positions— 
Functionalism and Gestalt. Since no 
selection criteria were indicated, the 
authors are open to the criticism that 
they have promised more than they 
have delivered. Also, Thorpe and 
Schmuller have not always been ac- 
curate. For example, they failed to 
differentiate statements about the 
theorist’s program and metatheory 
from those pertaining to the theory 
proper. They inferred from Hull’s 
preoccupation with biological sur- 
vival and adaptation that these are 
an integral part of his system. Hull, 
anticipating confusion on this point, 
wrote that “adaptive considerations 
are useful in making a preliminary 
search for postulates, but that once 
the postulates have been selected 
they must stand on their own feet.” 
Adaptation intrudes as a persistent 
theme in their discussions of most 
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theories. For example, they say “By 
learning Guthrie refers to such be- 
havior (acts) as will assist the organ- 
ism in making necessary adaptations. 
From this point of view learning is an 
additive process in that something 
helpful to him always accrues to the 
learner as a result of it’ (p. 97). This 
statement also is incorrect, for Guth- 
rie has indicated that the ‘‘common- 
sense” definition of learning as con- 
trasted with a “‘scientific’’ one such 
as his own “applies learning only to 
the attainment of good results and we 
shall find that we acquire bad habits 
and tendencies to failure in exactly 
the same way in which we acquire 
good habits and tendencies to suc- 
cess.’’ The authors in purporting to 
restate Thorndike’s law of effect say, 
“Stated simply, this law holds that 
the more one utilizes certain neural 
pathways—assuming that they exist 
as realities—the stronger become the 
bonds” (p. 52). 

A fundamental difficulty is inher- 
ent in the approach taken by the au- 
thors to accomplish their second pur- 
pose. At the present time it is not 
possible to apply entire systems of 
learning to problems because they are 
not highly developed, logically inte- 
grated sets of axioms and postulates. 
However, application can take place 
at the level of the more limited special 
theories, most of which are compo- 
nents of a general theory. 

When special theories are applied, 
three factors appear to determine 
their utility: (a) the language in 
which the theory is stated; (b) the 
reference experiments employed by 
the theorist; and (c) the particular 
use to which the theory is to be put. 
The first two are the responsibility of 
the theorist; the third is the choice of 
the practitioner. Both have neglected 
their responsibilities. Learning the- 
orists have not always provided ade- 
quate interpretations of their theo- 
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retical terms, making it impossible to 
apply parts of their theory in the 
sense of deducing relationships that 
should hold among the elements of 
any new situation. Those who wish to 
apply an existing special theory gen- 
erally have not translated the ele- 
ments and relationships in the new 
situation into terms used in stating 
the theory. This is a necessary pre- 
requisite for the accomplishment of 
the desired rapprochement between 
theory and practice. Since this was 
not done by Thorpe and Schmuller, 
it is not surprising that their final 
summary of the areas of agreement 
among learning theorists provides the 
educator with nothing he did not al- 
ready know about classroom pro- 
cedure. 

There is another consequence of be- 
ginning with well-known “general 
theories” rather than with a psycho- 
logical analysis of particular educa- 
tional problems. Some possible ap- 
plications were overlooked. Exam- 
ples include inhibition theory, which 
could be related to problems of motor 
skill performance in learning a trade; 
Gibson’s stimulus generalization hy- 
pothesis in verbal learning; Cofer and 
Foley, and also Osgood’s mediation 
hypotheses about language behavior; 
and Hovland’s application of ‘‘infor- 
mation theory’”’ to concept learning. 

An important oversight is the omis- 
sion of references to either the tables 
or figures in the text. Almost none of 
the 55 figures or 11 tables is referred 
to in the text. 

LAWRENCE M. STOLUROW 

University of Illinois 


GRUENBERG, SIDONIE M. (Ed.) The 
encyclopedia of child care and guid- 
ance. New York: Doubleday, 
1954. Pp. 1016. $7.50. 


editorial 
staff, advisory board, and large 
roster of contributors were involved 


A most distinguished 
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in producing this massive book. Part 
I, the first two-thirds of the book, 
contains more than 1,000 alphabeti- 
cally arranged, cross-referenced en- 
tries followed by a classified list of 
agencies and organizations and an 
annotated list of further readings. 
Part II consists of 30 excellent chap- 
ters on various aspects of child de- 
velopment and the social forces af- 
fecting children. 

The encyclopedia portion of the 
book is consistently addressed to the 
stereotype of a literate but com- 
pletely naive parent. The style is 
chatty and nontechnical while the 
pages are decorated with an abun- 
dance of line drawings, mostly of chil- 
dren. Inasmuch as almost any topic 
is likely to appear (e.g., Pediatrician, 
Pediatrics Clinic, Pediculosis Capitis, 
Peek-A-Boo, Pellagra, Pelvic Exami- 
nation, Pelvis, Penicillin, Percussion 
Instruments, Period, Periodicals for 
Children, Permanent Waves, etc.), 
the book is guaranteed to furnish 
even the most professional reader 
with interesting bits of information. 
Generally speaking, however, Part II 
will be of greater interest to psycholo- 
gists as a popular but comprehensive 
review of current thinking. 

Certainly few will quarrel with the 
repetitive message: Be loving and 
patient but seek professional help 
when a real problem exists. The re- 
viewer foresees little competition for 
Dr. Spock (himself a contributor) at 
35¢ or even Dr. Carmichael at $12.00. 

LEONARD S. KOGAN 

Institute of Welfare Research 

Community Service Society of 
New York 


SONNEMANN, ULRICH. Existence and 


therapy. New York: Grune & 
Stratton, 1954. Pp. xi+372. $5.00. 


This will doubtlessly prove to be 
an exceptionally difficult book for the 
average psychologist to understand, 
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whether he be clinician or experi- 
mentalist. Not only is Dr. Sonne- 
mann’s manner of writing exquisitely 
involved; but his ideas, aside from 


-the violent emotional reactions they 


are bound to arouse in most scienti- 
fically minded readers, almost defy 
any purely intellectual comprehen- 
sion. In Dr. Sonnemann’s own exis- 
tentialist view, these ideas, like all 
human knowledge, must presumably 
be experienced-as-being, or known- 
in-themselves, before they can be fac- 
tually analyzed or “understood.” 
They must be accepted, as it were, on 
faith—faith in man’s being or exist- 
ence. This may be so; and certainly 
logical positivists, who are positively 
anathema to dyed-in-the-wool exis- 
tentialists, would never argue that it 
may not be so, but merely that the 
question of whether or not it may be 
so, being essentially unprovable by 
empirical observation, is a meaning- 
less one. The question is: Why, be- 
lieving as he does that truth or knowl- 
edge can only be directly experienced 
by man as part of his essential hu- 
manity or being, does Dr Sonnemann 
(and his fellow existentialists) go to 
the trouble of long-windedly explain- 
ing to his fellow psychologists, in 
what suspiciously appears to be a 
highly analytic manner, why existen- 
tialism is far superior to every other 
system of psychology and philosophy 
ever known to man? 

Existence and Therapy is a thor- 
oughgoing discussion of the existen- 
tialist viewpoint of Heidegger, Jas- 
pers, Binswanger, Boss, and other 
recent European philosophers and 
clinicians. It attempts to make 
mincemeat of empiricism, objectiv- 
ism, experimentalism, Freudianism, 
Gestaltism, Jungianism, and vir- 
tually every other influential con- 
temporary way of thinking and 
therapizing. In so doing, it makes 
some telling points, particularly in 
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relation to some of the shortcomings 
of orthodox Freudians and Gestalt- 
ists. 

It is on the more positive side that 
Dr. Sonnemann’s detailed presenta- 
tion of daseinsanalytic theories seems 
to have much less to offer. Where 
these theories overlap with neopsy- 
choanalytic and holistic views of man 
as a total autonomous-autochtho- 
nous organism who ceaselessly inter- 
acts with other human beings and the 
external world, and who must be un- 
derstood and therapized in the light 
of his biological-social origins and de- 
velopment, existentialism seems emi- 
nently sane. But where it goes off, 
and often, into concepts of being and 
the naught, existence as _ being- 
toward-death, the who of existence, 
the universality of the love norm, 
etc., daseinsanalysis appears to 
become, at least to this empiri- 
cism-biased reviewer, distinctly un- 
objective, mystical, moralistic, and 
tautological. Its criticisms of other 


viewpoints sometimes make good 
sense; but its own espousals often 
seem to be double-talk. 

In any event, Dr. Sonnemann has 
herewith given us the first quite -de- 
tailed and reasonably definitive Eng- 
lish-language explanation of the dase- 


insanalytic philosophy. For all its 
difficulty and blood-pressure raising 
potentiality, his book is replete with 
provocative thoughts that certainly 
deserve a hearing. 
ALBERT ELLIs 
New York City 


PENNINGTON, L. A., & Berc, IRWIN 
A. (Eds.) An introduction to clini- 
cal psychology. (2nd Ed.) New 
York: Ronald Press, 1954. Pp. 
viii+709. $6.50. 


The student introduced to clinical 
psychology through this excellent re- 
vision of an established text would be 
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well introduced indeed and he should 
be both impressed and attracted to 
further study. He certainly could 
conclude that clinical psychologists 
are concerned with many things, from 
myopia to ethics. He should gain as- 
surance of finding a compatible role 
in the field whether he is attracted 
most by the opportunity to carry out 
research or to apply an art in the in- 
terest of humanity. He would, of 
course, be forewarned that the clinical 
psychologist is expected to burn both 
ends of the science-service candle, 
but it will be apparent to him from 
reading the text that his mentors 
often burn more brightly at one end 
than the other. 

The involved student might have 
trouble with Cattell’s introductory 
chapter, particularly if he rereads it 
after seeing what the other 29 authors 
have to say. Cattell doesn’t seem to 
know his own mind, and his ideas 
sometimes seem captive to his 
phrases. He states that the clinical 
psychologist should limit his concern 
to ‘‘those who for various reasons 
have failed in life’s educational proc- 
ess’ (p. 4), and that ‘‘the essential 
purpose of the profession is to guide 
the sick back to health” (p. 21). 
Thus the “‘ideal practitioner”’ of clini- 
cal psychology should have medical 
training in addition to his doctoral 
study in psychology. Finally, for ‘‘in- 
dustrial personnel work, vocational 
guidance, and educational psychol- 
ogy, the basic and central type of 
training and qualification needed is 
that of the clinical psychologist’’ (p. 
20). He is impressed by the deriva- 
tion of ‘clinical’ from ‘‘bed,”’ and it 
is left to the other authors to disre- 
gard etymology and provide opera- 
tional definitions indicating a much 
wider scope for clinical psychology. 
Furthermore the authors of chapters 
2 to 25 seem unaware that they are to 
be saved by factor analysis. A sug- 
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gestion to the editors in the third edi- 
tion: keep Cattell, for he is always 
stimulating, but get two or three 
others to write introductory chapters 
entitled ‘‘The Meaning of Clinical 
Psychology.” 

This second edition is substantially 
a new book. A hundred pages have 
been added. Ten new chapters have 
been included, six have been dropped, 
and the remaining revised consider- 
ably. Notable improvement has been 
achieved in presenting treatment 
procedures. The editors have sought 
unity in a volume likely to fall apart 
by encouraging the authors to keep 
problem centered rather than tech- 
nique centered. Most authors suc- 
ceed in doing this; the chapters by 
Shoben, Mowrer, Garner, Dorcus, 
and Pennington come to mind. Berg 
gets involved in technique to the 
point of warning against the distrac- 
tion in an interview of a cluttered 
desk or of bright objects in the border 
of the visual field. Sargent and 


Hirsch supply a solid chapter on pro- 
jective methods, possibly more useful 
to doctoral students reviewing for 
exams than intriguing to the neo- 


phyte. In this edition, a chapter on 
the professional relationships of the 
clinical psychologist is replaced by 
one on research. If the choice is be- 
tween the two, the focus on research 
is certainly to be preferred. But it 
would be useful to the beginning 
student to know something of the 
professional problems and responsi- 
bilities of the clinical psychologist. 
The writing is good throughout the 
book. Some chapters seem a bit pon- 
derous (like that of Saslow, Guze, 
and Matarazzo), some simple and 
highly effective (like that of Mc- 
Candless), and some brilliant (like 
that of Mowrer). A great deal of care 
has gone into the selection of the bib- 
liographies. They should be genu- 
inely useful to the beginning student. 
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The publishers might have been more 
generous in their provisions for illus- 
trations, since introductions are bet- 
ter remembered, or at least enjoyed 
more, when the subject is charming 
to the eye. Clinical psychology has 
more to offer visually than visual 
acuity test charts. Biographical 
notes on the authors, highlighting 
their work and thus exhibiting what 
clinical psychologists do, might add 
to the value and interest of the book 
to the beginning student. 

Because of the diligence of the edi- 
tors and the excellence of the authors, 
clinical psychology now has an even 
better introductory text than the 
good one it has had for the past six 
years. 

NIcHOLAS HosBss 

Peabody College 


KinG, H. E. Psychomotor aspects of 
mental disease: an experimental 
study. Cambridge: Harvard Uni- 
ver. Press, 1954. Pp. xiv+185. 
$3.50. 


The current emphasis on projective 
techniques has unfortunately tended 
to restrict experimental investiga- 
tions utilizing other procedures in the 
study of mental patients. In demon- 
strating a close relation between se- 
verity of behavior disorder and psy- 
chomotor performance, as measured 
by such “old-fashioned” tests as re- 
action time, speed of tapping, and 
finger dexterity, King has contrib- 
uted a quantified technique for 
measuring the current mental status 
of the patient. Indirectly, his find- 
ings emphasize the importance of the 
“open” approach in psychodiagnostic 
work. In the long run progress de- 
pends on the development of new 
techniques rather than on the ultra- 
refinement of available tools. 

James D. PaGe 

Temple University 








278 


THELEN, HerBert A. Dynamics of 
groups at work. Chicago: Univer. of 
Chicago Press, 1954. Pp. x+379. 
$6.00 


The study of the group has always 
been a major and is still an increasing 
concern of social psychology and 
sociology. And a great many social 
psychologists (some labeled psycholo- 
gists and some sociologists) have 
contributed to what we now know 
concerning the nature of the human 
group. Among more recent contribu- 
tors, Moreno, Lewin, and their fol- 
lowers have been prominent, even 
though their contributions are not 
quite as unique and fundamental as 
their followers persist in claiming. 

The intellectual offspring of Lewin 
include not only social psychologists, 
who have added to our knowledge of 
groups and who have shown useful 
ways to apply this information to the 
solution of practical problems, but 
also a growing social engineering cult. 
This cult has taken and sanctified 
the words of Lewin and has attrib- 
uted to him and his followers not only 
their own substantial findings but 
also other theories and techniques 
common among organizational and 
human relations specialists. These 
cult members brush up, simplify, and 
give freshened terminology to such 
items in order to make them readily 
saleable to civic leaders, social work 
directors, education administrators, 
and business managers. They offer 
a new dispensation from those mys- 
terious creatures, the ‘‘scientific social 
psychologists."’ Thelen’s book is a 
new and rather comprehensive ex- 
egesis on this cult’s materials. 

A science finds its application in 
the form of technologies. Persons 
functioning as technicians or engi- 
neers carry scientific findings into 
shops and neighborhoods and clinics 
and board rooms. And as the recent 
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history of specialists in the various 
fields demonstrates, engineers quickly 
seek professional reinforcement in 
cult-like groups and organizations. 
Such groups have a pressing ‘‘mis- 
sion’”’ not too well understood and 
appreciated by the uninitiated or 
even by the scientists from whom 
their knowledge derives. They also 
have initiation requirements and 
rites, revered leaders, and technical 
skills dependably applied only by 
cult members. For the cult to grow 
sturdily, too, through the increasing 
production of recognized and recog- 
nizable members, its materials must 
be standardized and rather precisely 
communicable. As Thelen demon- 
strates by his book, the group dy- 
namics cult has all these character- 
istics. 

The unique efforts of the scientist 
flourish best in the hands of indi- 
vidualists, but the fate of technicians 
and engineers is a joint one. It de- 
pends to a large degree upon joint 
success in jockeying their secular 
order of specialists into positions of 
prestige, control, and thus power. In 
areas where specialties more obvi- 
ously overlap and thus more sharply 
compete, such as in human relations, 
such joint cult-like efforts to cope 
with competition become all the more 
mandatory. 

This development of a ‘“‘new”’ kit 
of “tools’’ for management appar- 
ently results not only in the caricatur- 
ing of tested social psychological 
theories but also in a striking switch 
in basic values served. The cari- 
caturing is evinced in Thelen’s twelve 
lists of guidance principles for group 
workers and groups. These are based 
far more upon an_ unquestioning 
acceptance of a typical middle class 
societal surrogate’s version of current 
societal morality than upon scientific 
findings concerning groups. The bare 
mention of social classes and the fail- 
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ure to grasp their significance in per- 
sonality and in group functioning is 
a part of this problem. The switch 
in values appears in recurrent em- 
phases by Thelen upon manipulation 
to achieve morally preconceived goals 
(rather than upon scientific discovery 
of the nature of group members, goals, 
and processes) and upon the service 
of those concerned with societal sta- 
bility (rather than upon the stimu- 
lation of democratic processes of so- 
cietal change). 

Thelen’s first six chapters outline 
and illustrate six ‘‘technologies,”’ 
which, in his view, while differing, 
have behind them ‘fundamental 
similarities.”’ All six are apparently 
endorsed. Yet in their outlines, one 
finds such statements as these: Be- 
havior ‘‘is relevant and useful when 
it contributes to getting the job done, 
and it is hindering or nonuseful to the 
extent that it is a response to ideo- 
logical, class, or racial factors.” 
(Thelen apparently stereotypes ‘“‘ide- 
ological, class, or racial factors’’; he 
does not reveal an appreciation of 
their implications.) ‘‘ Neighbors were 
no longer seen as people one could 
identify with; the sense of common 
cause was lost.’’ (He is handicapped 
by sentimental connotations of the 
word, neighborhood.) ‘The role of 
the student is determined by the will 


279 


of the teacher—with some qualifica- 
tion by the standards of the peer and 
family groups to which the student 
belongs.” (Thelen apparently has not 
studied how peer and family group 
participation orients participation in 
subsequent groups and how class and 
ethnoid factors alter such participa- 
tion patterns.) 

As | studied this book, I constantly 
found myself seeing an authoritarian 
dean of students read it, nod his head 
in satisfaction, and then go forth and 
quote formulas from it to justify the 
disruption of democratic student ex- 
periments. When the princes had 
such as Machiavelli, propaganda 
analysts for the common man had it 
difficult enough. Now that they have 
such polished social engineers as 
Thelen who sincerely work ‘‘for the 
development of the ‘humane com- 
munity’ toward which man’s nature 

. is driving him,” analysis becomes 
all the more difficult and efforts to 
communicate it all the more confus- 
ing. But I trust I am overrating the 
group dynamics cult! Fortunately 
society has many antitoxins against 
its continued successful manipula- 
tion. Some of them may work slowly, 
but they work. 

ALFRED McCLunG LEE 

Brooklyn College 








EDITORIAL NOTE 


As approved by the Board of Di- 
rectors and the Council of the APA, 
beginning with January 1956, a new 
journal will be published by the As- 
sociation. This journal will review 
books, monographs, films and re- 
lated publications—a function pres- 
ently performed by four different 
APA journals. 

Accordingly, book reviews will ap- 
pear in the Psychological Bulletin only 
through the completion of the present 
volume, 52, November 1955 issue. 

Hereafter all publications sub- 
mitted for review and requests to pre- 
pare reviews should be directed to the 
editor of the new journal, Contem- 


porary Psychology, A Journal of Re- 
views: 


E. G. Boring, Editor 
Memorial Hall 
Harvard University 
Cambridge 38, Mass. 


The editors wish to take this op- 
portunity to express their consider- 
able gratitude to the cooperating 
book reviewers. Any contribution 
that this department may have made 
can be attributed only to the work of 
the reviewers, which we hope has not 
been a case of love's labour lost. 

E. G. 
W. D. 
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